Supplementary Information (docx 9116K)

advertisement
Supplementary information (SI)
The supplementary information offers a summary of methods and results that could not be
included in the main text due to space constraints. In addition, it contains eight
supplementary figures.
Catalyzed Reporter Deposition Fluorescence In Situ Hybridization (CARD-FISH).
Samples were fixed overnight with formaldehyde (3% final conc.) at 4°C and subsequently
filtered onto 0.2 μm polycarbonate GTTP membranes (Millipore). CARD-FISH was carried
out as decribed previously (Teira et al 2004) with the following modifications: The
permeabilization mix contained 10 mg ml-1 lysoyzme (Sigma L6876) or 10.9 mg ml-1
proteinase K (Sigma P4850), 0.01 M Tris-HCl (pH 8) and 0.4 M EDTA. The hybridization
buffer contained 0.05% (v/v) TritonX instead of 0.02% (v/v) SDS. Filters were hybridized with
horseradish peroxidase (HRP)-labeled oligonucleotide probe mixtures using Cren537 +
Cren557 probes to target archaeal and EUBI-III probes to target bacterial 16S rRNA gene
sequences. After the amplification step, filters were incubated (in the dark on ice for 3 min)
with DAPI (final conc. 1 μg ml-1), subsequently washed with ethanol (80% v/v) and Milli-Q
water and mounted in PBS-Vectashield-Citifluor (0.5 μg ml-1 PBS, 1 μg ml-1 Vectashield, 5.5
μg ml-1 Citifluor). Slides were examined at 1250x magnification under an epifluorescence
microscope (Axio Imager.M2, Zeiss).
Determination of optimum growth conditions. Cultures were incubated in SCM medium
at different pH (6.8, 7.1, 7.3, 7.5, 8.0, adjusted with HEPES, T = 30°C) or temperatures
(24°C, 28°C, 32°C, 37°C, pH adjusted to 7.1). When ammonium concentrations dropped
below 300 μM, triplicates of cultures were inoculated for a second time under the same
conditions to enable adaptation to the new parameters prior to analysis. Every second day, 1
ml samples were taken for flow cytometry. Temperature and pH optima were determined by
calculating the generation time, g, and specific growth rate, k.
N = N0 × 2n (N0 = Number of cells at the beginning of exponential growth, N = number of cells
at the end of exponential growth, n = number of generations)
g = t/n (g = generation time (days), t = duration of exponential growth (days), n = number of
generations)
k = 0.301/g (k = specific growth rate (d-1), g = generation time (days))
Flow cytometry. Samples were fixed with glutaraldehyde (5%) for 10 min and subsequently
stored at -80°C. Prior to analyses, samples were diluted in TE buffer, stained with
SYBRGreen I (final conc: 1:10,000) and cell abundances measured on an Accuri C6 flow
cytometer.
Transmission electron microscopy. Cells were fixed with glutaraldehyde (2.5%) in SCM
medium and immediately transferred to a 200-mesh Formvar-coated copper grid for the
entire fixation time. After 30 min, cells were briefly washed in fresh culture medium and then
rinsed with ultra-pure water. After drying, cells were stained in 0.5% aqueous uranyl-acetate
for 2 min, briefly washed by swirling the grid in a drop of ultra-pure water, blotted with filter
paper and air-dried. Preparations were examined with a ZEISS Libra 120 transmission
electron microscope.
Scanning electron microscopy. A total of 3 ml of cultures in late logarithmic phase were
fixed in 2.5% glutaraldehyde and 2% paraformaldehyde in cacodylate buffer (0.1 M; 1100
mOsm; pH 7.2) for 2-4 h at 4 °C. Cells were gently filtered onto a 0.2 µm polycarbonate filter
supported by a 0.45 µm polycarbonate filter. Cells were then rinsed 3 times with 10 mL of
cacodylate buffer and dehydrated through a graded series of ethanol. The filters were dried
using HMDS as an alternative to critical point drying (absolute ethanol:HMDS (1:1)) solution
for 10 min followed by three steps of pure HDMS and air drying). All solutions used were
previously 0.2 µm filtered. Filter pieces mounted on aluminium stub were sputter coated with
gold before observation under a Philips XL 20 scanning electron microscope.
DNA extraction for pyrosequencing. Highly enriched 1l batch cultures with cells in early
stationary phase were used for genomic DNA extraction. Microbial biomass was obtained by
centrifuging 4x250 ml culture (45 min, 14,000 g) in a high performance centrifuge (Avanti J26 XP, Beckman Coulter). Each pellet was resuspended in 0.5 ml sodium dodecyl sulfate
(SDS) extraction buffer (0.7 M NaCl, 0.1 M Na2(SO3), 0.1 M Tris/HCl pH 7.5, 0.05 M EDTA
pH
8,
1
v/v%
SDS)
and
the
mixture
transferred
into
Lysing
Matrix
tubes.
Phenol/chloroform/isoamyl alcohol (0.5 ml in total, 25:24:1) was added and the tubes placed
in a Fast prep machine (speed 4 for 30 s) to lyse the cells and separate genomic DNA from
proteins. After cooling the tubes on ice (2 min) and centrifuging (10 min, 10,000 g), 0.5 ml
chloroform/isoamyl alcohol (24:1) was added to the supernatant and after another
centrifugation step (10 min, 10,000 g), 2 × volume polyethylene glycole (PEG) solution (1.6
M NaCl, 30% PEG) was added and the mixture incubated overnight. Subsequently, the tubes
were centrifuged for at least 30 min (4°C, 10,000 g), the pellet washed with cold ethanol
(70% v/v), dried, eluted in 50 μl ultrapure H2O and stored at - 20°C.
Gene alignment and phylogenetic analyses. Alignment of full-length 16S and 23S rRNA
genes from cultivated Thaumarchaeota with sequenced genomes was performed with the L-
INS-i method in MAFFT (Katoh and Standley 2013) based on Archaea-specific, structurally
accurate, seed alignments obtained from the Comparative RNA Web (CRW) Site (Cannone
et al 2002). Alignments of partial AmoA (198 amino acid positions) from cultivated
Thaumarchaeota and available thaumarchaeal full AmoB protein sequences were calculated
with EXPRESSO by combining protein structural information and the output of different de
novo methods (Armougom et al 2006). The alignments were reverse-translated with tranalign
(EMBOSS software suite; (Rice et al 2000) through the Galaxy platform (Blankenberg et al
2007). Given that only partial amoA and amoB gene sequences of “Ca. N. salaria” BD31
were annotated in the NCBI database, we assembled manually the full gene sequences with
Blastn searches targeting the original genomic contigs. AmoB signal peptide predictions
were performed with PolyPhobius (Käll et al 2007), based on the full-length AmoB protein
sequence
alignment
of
available
thaumarchaeal
sequences.
Analyses
of
cryptochrome/photolyase family (CPF) protein sequences included representatives of all
major CPF classes, including biochemically characterized proteins, and representative
putative photolyases from all phyla with Blastn hits sharing >40% amino acid identity with
those of “Ca. N. piranensis” D3C and “Ca. N. adriaticus” NF5. Given the high divergence
between CPF protein sequences in our dataset, alignments were calculated by homology
extension with PSI-Coffee (Kemena and Notredame 2009), based on structural information
with EXPRESSO (Armougom et al 2006), by a combination of homology extension and
structural information with PROMALS3D (Pei et al 2008); and with several de novo
sequence-based methods: MAFFT (L-INS-I method) (Katoh and Standley 2013), MSAProbs
(Liu et al 2010), ProbAlign (Roshan and Livesay 2006), ProbCons (Do et al 2005), T-Coffee
(Notredame et al 2000), Muscle (Edgar 2004) and Clustal Omega (Sievers et al 2011). The
resulting ten alignments were evaluated with MUMSA (Lassmann and Sonnhammer 2005),
and those with scores above the average (that is MSAProbs, ProbAlign, ProbCons,
EXPRESSO, and PSI-Coffee) were selected and merged with MergeAlign while preserving
all aligned positions (Collingridge and Kelly 2012). Phylogenetically unreliable positions were
filtered from all alignments with TCS using low-stringency parameters (Chang et al 2014).
Selection of best-fit models of nucleotide and amino acid substitution and phylogenetic trees
were calculated by maximum likelihood (ML) with IQ-Tree (Nguyen et al 2015). The
concatenated 16S-23S rRNA and amoA gene phylogenies were based on the GTR model
(Tavaré 1986) with a proportion of invariable sites and four Gamma (Γ) rate categories
(GTR+I+Γ4); for amoB on the GTR+Γ4 model; and for CPF proteins on the LG+I+Γ4 model
(Le and Gascuel 2008). Ultrafast bootstrap (UFBoot) (Minh et al 2013) and Shimodaira–
Hasegawa-like approximate likelihood ratio test (SH-aLRT) (Guindon et al 2010) support
values were calculated with IQ-Tree based on 1000 replicates each, and represented on the
best-known ML tree. The trees were visualized and edited with FigTree 1.4.2
(http://tree.bio.ed.ac.uk/software/figtree/).
Analyses of amoB2. To exclude the possibility of artefacts derived from the genomic
assembly, we performed PCR analyses with a primer pair covering the amoB1 gene from
strain D3C and amoB genes from other Nitrosopumilales, and two primer pairs specific for
the amoB2 of strain D3C. The results confirmed the presence of amoB2 exclusively in strain
D3C, but not in NF5, and further Sanger sequencing of PCR products yielded amoB/amoB1
and amoB2 sequences identical to those retrieved from the assembled genomes.
amoB1-98F 5’- ATGCACACGGTGTCCAAGCAC-3’
amoB1-447R 5’-TGTTTGACCTGGACCGAGTC-3’
amoB2-50F 5’-TCCTGTTATCTAGCGTGT-3’
amoB2-143F 5’-CATTTGATAAACATCGCATGC-3’
amoB2-381R 5’-TATGTGATACGTTCCAGG-3’
amoB2-534R 5’-ACCAAGTGCAATAATACACC-3’
Supplementary references
Armougom F, Moretti S, Poirot O, Audic S, Dumas P, Schaeli B et al (2006). Expresso: automatic
incorporation of structural information in multiple sequence alignments using 3D-Coffee. Nucleic Acids
Res 34: W604-608.
Blankenberg D, Taylor J, Schenck I, He J, Zhang Y, Ghent M et al (2007). A framework for
collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly. Genome Res
17: 960-964.
Cannone JJ, Subramanian S, Schnare MN, Collett JR, D'Souza LM, Du Y et al (2002). The
comparative RNA web (CRW) site: an online database of comparative sequence and structure
information for ribosomal, intron, and other RNAs. BMC Bioinformatics 3: 2.
Chang JM, Di Tommaso P, Notredame C (2014). TCS: a new multiple sequence alignment reliability
measure to estimate alignment accuracy and improve phylogenetic tree reconstruction. Mol Biol Evol
31: 1625-1637.
Collingridge PW, Kelly S (2012). MergeAlign: improving multiple sequence alignment performance by
dynamic reconstruction of consensus multiple sequence alignments. BMC Bioinformatics 13: 117.
Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005). ProbCons: Probabilistic consistencybased multiple sequence alignment. Genome Res 15: 330-340.
Edgar RC (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput.
Nucleic Acids Res 32: 1792-1797.
Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010). New algorithms and
methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst
Biol 59: 307-321.
Käll L, Krogh A, Sonnhammer EL (2007). Advantages of combined transmembrane topology and
signal peptide prediction--the Phobius web server. Nucleic Acids Res 35: W429-432.
Katoh K, Standley DM (2013). MAFFT multiple sequence alignment software version 7: improvements
in performance and usability. Mol Biol Evol 30: 772-780.
Kemena C, Notredame C (2009). Upcoming challenges for multiple sequence alignment methods in
the high-throughput era. Bioinformatics 25: 2455-2465.
Lassmann T, Sonnhammer EL (2005). Automatic assessment of alignment quality. Nucleic Acids Res
33: 7120-7128.
Le SQ, Gascuel O (2008). An improved general amino acid replacement matrix. Mol Biol Evol 25:
1307-1320.
Liu Y, Schmidt B, Maskell DL (2010). MSAProbs: multiple sequence alignment based on pair hidden
Markov models and partition function posterior probabilities. Bioinformatics 26: 1958-1964.
Minh BQ, Nguyen MA, von Haeseler A (2013). Ultrafast approximation for phylogenetic bootstrap. Mol
Biol Evol 30: 1188-1195.
Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ (2015). IQ-TREE: a fast and effective stochastic
algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32: 268-274.
Notredame C, Higgins DG, Heringa J (2000). T-Coffee: A novel method for fast and accurate multiple
sequence alignment. J Mol Biol 302: 205-217.
Pei J, Tang M, Grishin NV (2008). PROMALS3D web server for accurate multiple protein sequence
and structure alignments. Nucleic Acids Res 36: W30-34.
Rice P, Longden I, Bleasby A (2000). EMBOSS: the European Molecular Biology Open Software
Suite. Trends Genet 16: 276-277.
Roshan U, Livesay DR (2006). Probalign: multiple sequence alignment using partition function
posterior probabilities. Bioinformatics 22: 2715-2721.
Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W et al (2011). Fast, scalable generation of
high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7: 539.
Tavaré S (1986). Some probabilistic and statistical problems in the analysis of DNA sequences.
American Mathematical Society: Lectures on Mathematics in the Life Sciences 17: 57-86.
Teira E, Reinthaler T, Pernthaler A, Pernthaler J, Herndl GJ (2004). Combining catalyzed reporter
deposition-fluorescence in situ hybridization and microautoradiography to detect substrate utilization
by bacteria and Archaea in the deep ocean. Appl Environ Microbiol 70: 4411-4414.
Supplementary figures
Figure S1. CARD-FISH images of strains NF5 (left) and D3C (right) using Archaea specific probes (a,
b) and Bacteria specific probes (c, d). Specific probes = green, DAPI counterstaining = blue.
Figure S2. Phylogenetic tree of amoA genes from cultivated Thaumarchaeota, including strains D3C and NF5. Gene fragments (594 bp) were aligned on protein
level with EXPRESSO, based on structural information and several alignment methods combined, followed by reverse-translation with tranalign and filtering of
unreliable positions with TCS. The tree was calculated by maximum likelihood with IQ-Tree based on the GTR+I+Γ4 model, with ultrafast bootstrap (UFBoot) and
SH-aLRT support values each inferred from 1000 replicates (see SI materials and methods for details). Support values ≥85% are represented on the respective
branches by semi-circles color-coded as indicated in the figure.
Figure S3. Temperature (A) and pH (B) optima of strains NF5 (white circles) and D3C (black circles) in medium containing 1 mM NH4Cl. The pH was adjusted to
7.1 with HEPES buffer and did not change during incubations. Error bars represent standard deviations of measurements from triplicate cultures.
Figure S4. Genome plots of “Ca. N. adriaticus” NF5 (left) and “Ca. N. piranensis” D3C (right). Unique genomic regions are calculated using an amino acid identity
cut-off of <40%.
Figure S5. Alignment of the AmoB2 from “Ca. N. piranensis” D3C with other available thaumarchaeal AmoB sequences, including the partial protein sequence of
“Ca. Nitrosocaldus yellowstonii” HL72 available (EU239961). Predicted signal peptides are not shown. Boxes indicate the conserved histidine residues that
coordinate the copper active site. Residues conserved in all thaumarchaeal AmoB protein sequences are shaded in grey; motifs and single amino acid residues
unique to the AmoB2 of “Ca. N. piranensis” D3C are highlighted in black.
Figure S6. Phylogenetic tree of amoB genes from Thaumarchaeota, including strains D3C and NF5,
organisms with sequenced genomes, single-cell amplified genomes and metagenomic sequences.
Full-length protein sequences were aligned on protein level with EXPRESSO based on structural
information and several alignment methods combined; followed by reverse-translation with tranalign
and low-stringency filtering of unreliable positions with TCS (564 nucleotide positions in the final
alignment). The tree was calculated by maximum likelihood with IQ-Tree based on the GTR+Γ4
model, with ultrafast bootstrap (UFBoot) and SH-aLRT support values each inferred from 1000
replicates (see SI materials and methods for details). Support values ≥85% are represented on the
respective branches by semi-circles color-coded as indicated in the figure.
Figure S7. Urea cluster (A) and motility cluster (B) of “Ca. N. piranensis” D3C and “Ca. N. adriaticus”
NF5, respectively, in comparison with other enrichment cultures. Csym: “Ca. Cenarchaeum
symbiosum” A, Nsed: “Ca. Nitrosopumilus sediminis” AR2, Nlim: “Ca. Nitrosoarchaeum limnia” SFB1,
Ngar: “Ca. Nitrososphaera gargensis” Ga9.2
Figure S8. Phylogenetic tree of the cryptochrome/photolyase protein family (CPF) showing the
putative photolyases from strains D3C and NF5. The phylogeny includes representative proteins from
all phyla with photolyase homologs sharing >40% amino acid identity with those of strains D3C and
NF5 (sequences named after organisms and empty collapsed clades). Shaded areas highlight the
Thaumarchaeota-specific clade (dark grey) and the associated broader lineage (light grey) including
the CPD I photolyase of S. tokodaii 7 (bold; see main text). Full-length protein sequences were aligned
with several template-based and de novo methods combined with MergeAlign, followed by lowstringency filtering of unreliable positions with TCS (398 amino acid positions in the final alignment).
The tree was calculated by maximum likelihood with IQ-Tree based on the LG+I+Γ4 model, with
ultrafast bootstrap (UFBoot) and SH-aLRT support values each inferred from 1000 replicates (see SI
materials and methods for details). Support values ≥85% are represented on the respective branches
by semi-circles color-coded as indicated in the figure.
Download