Supplementary Information (doc 101K)

advertisement
Diversification and niche adaptations of Nitrospina-like bacteria in the polyextreme interfaces of the Atlantis II Deep brine from the Red Sea
David Kamanda Ngugi1*, Jochen Blom2, Ramunas Stepanauskas3, and Ulrich Stingl1
1
Red Sea Research Centre, King Abdullah University of Science and Technology, Thuwal
23955-6900, Saudi Arabia; 2Bioinformatics and Systems Biology, Justus Liebig University
Giessen, Germany; 3Bigelow Laboratories for Ocean Sciences, 60 Bigelow Drive, East
Boothbay, ME 04544-0380, USA. *Corresponding author: david.ngugi@kaust.edu.sa
SUPPLMENTARY FIGURE LEGENDS
Figure S1: Hierachical clustering of OTUs from 16S rRNA gene sequences of brine-seawater
interface (bsi) previously classified as Nitrospina-like in five different brines based on data
from Ngugi et al. (2014). Operational taxonomic units were clustered at 97% identity level.
Only OTUs with an abundance of 1% of all reads that were assigned as “Nitrospina” are
shown.
Figure S2: Phylum-level taxonomic assignment of the predicted proteomes of Nitrospina
gracilis and Ca. Nitromaritima SAGs from the Red Sea (RS), the N. Pacific (NA), and the N.
Atlantic (NA) as predicted using MAPLE (see methods for details).
Figure S3: Venn diagram showing the core genesets (in bold) shared between the (pan)genomes of the three geographical separated Ca. Nitromaritima species and Nitrospina
gracilis. Values in bracket show the unique genesets in each (pan)-genome.
Figure S4: Gene cluster organization of pyruvate:ferredoxin oxidoreductase (PFOR) operon
of one of our SAGs (SCGC AAA799-C22), Nitrospina gracilis 3/211, and Ca. Nitromaritima
sp. B18 (from the N. Atlantic Ocean. Note that the Ca. N. species B18 lacks a putative
rubrerythrin-like gene.
Figure S5: Maximum-likelihood tree showing the phylogenetic placement of the -subunit of
the nitrite oxidoreductase (nxrB) gene sequences from low- and high-affinity NOB (blue and
red leaves respectively) among selected type II dimethyl sulfoxide reductase enzyme family.
Sequences from Ca. Nitromaritima single-cell amplified genomes are shown in bold.
Figure S6: Ratios between the frequency of nxrA gene homologues from low-affinity (LNOB)
and high-affinity NOB (HNOB) in metagenomic datasets. Note that the ratios were calculated
excluding nxrA gene homologues from anaerobic ammonia oxidizing (anammox) bacteria.
Figure S7: Organization of gene clusters for the biosynthesis of ectoine and hydroxyectoine
(ectABCD) in one of our SAGs (SCGC AAA799-C22) and also Nitrococcus mobilis Nb-231.
Note that among all genome-sequenced NOB, only these two carry the EctABC operon, and
Ngugi et al.
1
only our SAGs have the ABC-type transporter for ectoine (ehuACBD) and the gene encoding
for ectoine hydroxylase (ectD).
Figure S8: Box plots showing the distribution in the isoelectric point (pI) of predicted
proteomes among members of the proposed candidate phylum Nitrospinae (in grey),
representatives of various nitrite-oxidizers and anaerobic ammonia-oxidizing bacteria (in
yellow), aerobic ammonia-oxidizing prokaryotes (in purple), and typical planktonic bacteria
(in white) relative to extreme (in red) and moderate (in light blue) halophiles. The three panels
show data for the overall predicted proteome (a), as well as protein-coding genes without any
trans-membrane domains (b) or with a single trans-membrane domain (c). The dashed blue
line demarcates a pI of 7.0, while triangles show the mean. Our single cell genomes are shown
in bold.
Figure S9: Phylogenetic analysis of putative oxidases encoded in the Nitrospinae genomes (in
red and blue) relative to the bd-type quinol oxidases (in grey) and the proposed “cytochrome
bd-like oxidase” from Ca. N. defluvii (NIDE0901, in bold; see Lücker et al., 2010). (a) Shows
a maximum-likelihood tree of the genes putatively encoding for the  subunit of the predicted
oxidases, including those from N. gracilis (in blue). Branches with boostrap values above
85% are highlighted with a black dot ontop of each node based on 100 iterations.
Characterized bd-type quinol oxidases (in grey) were used for outgrouping. (b) Depicts a
multiple sequence alignment of the above proteins highlighting residues that putatively
possess functions homologous to those in typical cytochrome c oxidases, including residues
involved in the binding of heme groups (in red) and their alternatives (in yellow), those
involved in copper binding (in turquoise), or those that are conserved in cyt. c oxidases (in
green). Residues that are required for quinol binding in bd-type quinol oxidases are
highlighted as well (in pupurple). Note the divergence of “cyt. bd-like oxidases” (harbouring
copper binding sites but no quinol residues) and “bd-like enzymes” (lacking both copper and
quinol binding sites).
SUPPLMENTARY TABLE LEGENDS
Table S1: Metadata of the sampling location, where the single cell genomes were obtained.
Table S2: Distance matrix of the 16S rRNA genes of all Nitrospina-like bacterial sequences
used for constructing the tree presented in Figure 1.
Table S3: Estimation of genome completeness based on 104 single-copy genes as determined
using CheckM.
Table S4: 454 metagenomic libraries and their associated metadata that were used for
fragement recruitment analyses.
Table S5: The percentage of overlap (upper triangle) and average nucleotide identity (ANI,
lower triangle) between pairs of genomes including our SAGs (in bold), the related Ca.
Nitromaritima SAGs and fosmids, and canonical NOBs.
Table S6: Genesets in the core genome of Nitrospina gracilis and the pan-genome of our
SAGs. The first two columns show the corresponding orthologous proteins and their predicted
fucntions based on annotations in NCBI (text highlighted in grey), while the subsequenct
Ngugi et al.
2
columns are based on our automated annotation in INDIGO. Text highlighted in pink, include
those genes encoding enzymes that are discussed in the main text or depicted in Figure 6.
Table S7: Spearman correlation coefficient in the occurrence of genes.
Table S8: Unique genesets in the pan-genome of Ca. Nitromaritima RS (relative to the
genome of Nitrospina gracilis). All the genes discussed in the main text or depicted in Figure
6 are highlighted in red.
Table S9: List of transporters predicted to be encoded in the (pan)-genomes of N. gracilis
(S9A), Ca. Nitromaritima RS (S9B), Ca. Nitromaritima NA (S9C), and Ca. Nitromaritima sp.
NP (S9D) based on automated annotation via the web-based transporters (TransAAP)
annotation tool (http://www.membranetransport.org/).
SUPPLEMENTARY MATERIALS & METHODS
MATERIALS AND METHODS
Phylogenetic analyses
All phylogenetic analyses were conducted in Geneious Pro v7.1.2 (Biomatters Ltd, Aukcland,
NZ; http://www.geneious.com). For the 16S (≥1400 bp) and 23S (≥2600 bp) rRNA genebased phylogenetic trees were constructed by aligning the respective sequences using the
SINA alignment webtool based on the SILVA 115 database (http://www.arb-silva.de/aligner).
Phylogenetic analyses were then performed by importing the aligned sequences into Geneious
Pro and computing a maximum-likelihood tree with PHYML (Guindon & Gascuel, 2003)
based on the WAG substitution model (100 bootstraps), an estimated gamma distribution
parameter (I), and four discrete substitution rate categories (Γ4). The best nucleotide
substitution model (i.e., WAG) was selected prior to phylogenetic inference using jModelTest
(Posada, 2008). A Bayesian consensus tree was also constructed using MrBayes v3.2.1
(Ronquist & Huelsenbeck, 2003) from the same alignment with the GTR + I + Γ4 model.
MrBayes was run with four chains for 1 million generations and trees were sampled every 200
generations. To construct the consensus tree, 10% of the sampled points were discarded as
“burn-in”.
To infer phylogeny using the internal transcribed spacer (ITS) region between the 16S and
23S rRNA genes, we first extracted the ITS sequences from each genome based on the 16S
Ngugi et al.
3
and 23S rRNA gene coordinates; average ITS sequence sizes were 454 and 478 bp for the
Nitrospina-like SAGs and N. gracilis respectively, with both encoding tRNAs for Iso-leucine
and Alanine. Closest blast hits were then obtained from the NCBI’s non-redundant (nr) and
whole-genome shotgun databases. Sequences were aligned using MUSCLE (Edgar, 2004)
followed by phylogenetic inference as described above.
The phylogenetic analysis of the genes encoding for nitrite oxidoreductase (NXR) was
performed by aligning the protein-encoding sequences of the NxrA (1,416 amino acid
positions; Figure 5B) and NxrB (561 amino acid positions; Figure S5) subunits from the
“Nitromaritima” SAGs into existing databases of the type II DMSO reductases enzyme family
proteins (Lücker et al., 2013) using MUSCLE (Edgar, 2004). The best amino acid substitution
model (WAG + I + Γ4) was then selected using ProTest3 (Darriba et al., 2011) prior to
phylogenetic inference as described above using PhyML and MrBayes.
Phylogenetic analysis of the genes encoding for the putative alpha subunit of “cytochrome
bd-like” oxidases was performed by generating a consensus multiple sequence alignment from
multiple independent alignments of protein sequences that had been aligned using MAFFT
(Katoh et al., 2005) based on different amino acid substitution matrices as implemented in
MergeAlign (http://mergealign.appspot.com/; Collingridge & Kelly, 2012). Aligned positions
(and gaps) with a mean score less than 50% were then removed resulting in an alignment with
543 aligned positions. A maximum-likelihood tree was then reconstructed from the optimized
alignment using PhyML (100 boostrap) with the amino acid substitution model WAG.
Sequences encoding for cydA gene of characterized bd-type quinol oxidases were used for
outgrouping, namely from E. coli (Acc. No. EDV65286), Bacillus subtilis, (Acc. No.
BAA11727), and Azotobacter vinelandii (Acc. No. ACO78197).
REFERENCES
Collingridge PW, Kelly S. (2012). MergeAlign: improving multiple sequence alignment
performance by dynamic reconstruction of consensus multiple sequence alignments. BMC
Bioinformatics 13:117.
Darriba D, Taboada GL, Doallo R, Posada D. (2011). ProtTest 3: fast selection of best-fit
models of protein evolution. Bioinformatics 27:1164–1165.
Edgar RC. (2004). MUSCLE: multiple sequence alignment with high accuracy and high
Ngugi et al.
4
throughput. Nucleic Acids Research 32:1792–1797.
Guindon S, Gascuel O. (2003). A simple, fast, and accurate algorithm to estimate large
phylogenies by maximum likelihood. Systematic Biol 52:696–704.
Katoh K, Kuma K-I, Toh H, Miyata T. (2005). MAFFT version 5: improvement in accuracy
of multiple sequence alignment. Nucleic Acids Research 33:511–518.
Lücker S, Nowka B, Rattei T, Spieck E, Daims H. (2013). The genome of Nitrospina gracilis
illuminates the metabolism and evolution of the major marine nitrite oxidizer. Front
Microbiol 4:27.
Posada D. (2008). jModelTest: phylogenetic model averaging. Molecular Biology and
Evolution 25:1253–1256.
Ronquist F, Huelsenbeck JP. (2003). MrBayes 3: Bayesian phylogenetic inference under
mixed models. Bioinformatics 19:1572–1574.
Ngugi et al.
5
Download