Supplementary information (SI) The supplementary information offers a summary of methods and results that could not be included in the main text due to space constraints. In addition, it contains eight supplementary figures. Catalyzed Reporter Deposition Fluorescence In Situ Hybridization (CARD-FISH). Samples were fixed overnight with formaldehyde (3% final conc.) at 4°C and subsequently filtered onto 0.2 μm polycarbonate GTTP membranes (Millipore). CARD-FISH was carried out as decribed previously (Teira et al 2004) with the following modifications: The permeabilization mix contained 10 mg ml-1 lysoyzme (Sigma L6876) or 10.9 mg ml-1 proteinase K (Sigma P4850), 0.01 M Tris-HCl (pH 8) and 0.4 M EDTA. The hybridization buffer contained 0.05% (v/v) TritonX instead of 0.02% (v/v) SDS. Filters were hybridized with horseradish peroxidase (HRP)-labeled oligonucleotide probe mixtures using Cren537 + Cren557 probes to target archaeal and EUBI-III probes to target bacterial 16S rRNA gene sequences. After the amplification step, filters were incubated (in the dark on ice for 3 min) with DAPI (final conc. 1 μg ml-1), subsequently washed with ethanol (80% v/v) and Milli-Q water and mounted in PBS-Vectashield-Citifluor (0.5 μg ml-1 PBS, 1 μg ml-1 Vectashield, 5.5 μg ml-1 Citifluor). Slides were examined at 1250x magnification under an epifluorescence microscope (Axio Imager.M2, Zeiss). Determination of optimum growth conditions. Cultures were incubated in SCM medium at different pH (6.8, 7.1, 7.3, 7.5, 8.0, adjusted with HEPES, T = 30°C) or temperatures (24°C, 28°C, 32°C, 37°C, pH adjusted to 7.1). When ammonium concentrations dropped below 300 μM, triplicates of cultures were inoculated for a second time under the same conditions to enable adaptation to the new parameters prior to analysis. Every second day, 1 ml samples were taken for flow cytometry. Temperature and pH optima were determined by calculating the generation time, g, and specific growth rate, k. N = N0 × 2n (N0 = Number of cells at the beginning of exponential growth, N = number of cells at the end of exponential growth, n = number of generations) g = t/n (g = generation time (days), t = duration of exponential growth (days), n = number of generations) k = 0.301/g (k = specific growth rate (d-1), g = generation time (days)) Flow cytometry. Samples were fixed with glutaraldehyde (5%) for 10 min and subsequently stored at -80°C. Prior to analyses, samples were diluted in TE buffer, stained with SYBRGreen I (final conc: 1:10,000) and cell abundances measured on an Accuri C6 flow cytometer. Transmission electron microscopy. Cells were fixed with glutaraldehyde (2.5%) in SCM medium and immediately transferred to a 200-mesh Formvar-coated copper grid for the entire fixation time. After 30 min, cells were briefly washed in fresh culture medium and then rinsed with ultra-pure water. After drying, cells were stained in 0.5% aqueous uranyl-acetate for 2 min, briefly washed by swirling the grid in a drop of ultra-pure water, blotted with filter paper and air-dried. Preparations were examined with a ZEISS Libra 120 transmission electron microscope. Scanning electron microscopy. A total of 3 ml of cultures in late logarithmic phase were fixed in 2.5% glutaraldehyde and 2% paraformaldehyde in cacodylate buffer (0.1 M; 1100 mOsm; pH 7.2) for 2-4 h at 4 °C. Cells were gently filtered onto a 0.2 µm polycarbonate filter supported by a 0.45 µm polycarbonate filter. Cells were then rinsed 3 times with 10 mL of cacodylate buffer and dehydrated through a graded series of ethanol. The filters were dried using HMDS as an alternative to critical point drying (absolute ethanol:HMDS (1:1)) solution for 10 min followed by three steps of pure HDMS and air drying). All solutions used were previously 0.2 µm filtered. Filter pieces mounted on aluminium stub were sputter coated with gold before observation under a Philips XL 20 scanning electron microscope. DNA extraction for pyrosequencing. Highly enriched 1l batch cultures with cells in early stationary phase were used for genomic DNA extraction. Microbial biomass was obtained by centrifuging 4x250 ml culture (45 min, 14,000 g) in a high performance centrifuge (Avanti J26 XP, Beckman Coulter). Each pellet was resuspended in 0.5 ml sodium dodecyl sulfate (SDS) extraction buffer (0.7 M NaCl, 0.1 M Na2(SO3), 0.1 M Tris/HCl pH 7.5, 0.05 M EDTA pH 8, 1 v/v% SDS) and the mixture transferred into Lysing Matrix tubes. Phenol/chloroform/isoamyl alcohol (0.5 ml in total, 25:24:1) was added and the tubes placed in a Fast prep machine (speed 4 for 30 s) to lyse the cells and separate genomic DNA from proteins. After cooling the tubes on ice (2 min) and centrifuging (10 min, 10,000 g), 0.5 ml chloroform/isoamyl alcohol (24:1) was added to the supernatant and after another centrifugation step (10 min, 10,000 g), 2 × volume polyethylene glycole (PEG) solution (1.6 M NaCl, 30% PEG) was added and the mixture incubated overnight. Subsequently, the tubes were centrifuged for at least 30 min (4°C, 10,000 g), the pellet washed with cold ethanol (70% v/v), dried, eluted in 50 μl ultrapure H2O and stored at - 20°C. Gene alignment and phylogenetic analyses. Alignment of full-length 16S and 23S rRNA genes from cultivated Thaumarchaeota with sequenced genomes was performed with the L- INS-i method in MAFFT (Katoh and Standley 2013) based on Archaea-specific, structurally accurate, seed alignments obtained from the Comparative RNA Web (CRW) Site (Cannone et al 2002). Alignments of partial AmoA (198 amino acid positions) from cultivated Thaumarchaeota and available thaumarchaeal full AmoB protein sequences were calculated with EXPRESSO by combining protein structural information and the output of different de novo methods (Armougom et al 2006). The alignments were reverse-translated with tranalign (EMBOSS software suite; (Rice et al 2000) through the Galaxy platform (Blankenberg et al 2007). Given that only partial amoA and amoB gene sequences of “Ca. N. salaria” BD31 were annotated in the NCBI database, we assembled manually the full gene sequences with Blastn searches targeting the original genomic contigs. AmoB signal peptide predictions were performed with PolyPhobius (Käll et al 2007), based on the full-length AmoB protein sequence alignment of available thaumarchaeal sequences. Analyses of cryptochrome/photolyase family (CPF) protein sequences included representatives of all major CPF classes, including biochemically characterized proteins, and representative putative photolyases from all phyla with Blastn hits sharing >40% amino acid identity with those of “Ca. N. piranensis” D3C and “Ca. N. adriaticus” NF5. Given the high divergence between CPF protein sequences in our dataset, alignments were calculated by homology extension with PSI-Coffee (Kemena and Notredame 2009), based on structural information with EXPRESSO (Armougom et al 2006), by a combination of homology extension and structural information with PROMALS3D (Pei et al 2008); and with several de novo sequence-based methods: MAFFT (L-INS-I method) (Katoh and Standley 2013), MSAProbs (Liu et al 2010), ProbAlign (Roshan and Livesay 2006), ProbCons (Do et al 2005), T-Coffee (Notredame et al 2000), Muscle (Edgar 2004) and Clustal Omega (Sievers et al 2011). The resulting ten alignments were evaluated with MUMSA (Lassmann and Sonnhammer 2005), and those with scores above the average (that is MSAProbs, ProbAlign, ProbCons, EXPRESSO, and PSI-Coffee) were selected and merged with MergeAlign while preserving all aligned positions (Collingridge and Kelly 2012). Phylogenetically unreliable positions were filtered from all alignments with TCS using low-stringency parameters (Chang et al 2014). Selection of best-fit models of nucleotide and amino acid substitution and phylogenetic trees were calculated by maximum likelihood (ML) with IQ-Tree (Nguyen et al 2015). The concatenated 16S-23S rRNA and amoA gene phylogenies were based on the GTR model (Tavaré 1986) with a proportion of invariable sites and four Gamma (Γ) rate categories (GTR+I+Γ4); for amoB on the GTR+Γ4 model; and for CPF proteins on the LG+I+Γ4 model (Le and Gascuel 2008). Ultrafast bootstrap (UFBoot) (Minh et al 2013) and Shimodaira– Hasegawa-like approximate likelihood ratio test (SH-aLRT) (Guindon et al 2010) support values were calculated with IQ-Tree based on 1000 replicates each, and represented on the best-known ML tree. The trees were visualized and edited with FigTree 1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/). Analyses of amoB2. To exclude the possibility of artefacts derived from the genomic assembly, we performed PCR analyses with a primer pair covering the amoB1 gene from strain D3C and amoB genes from other Nitrosopumilales, and two primer pairs specific for the amoB2 of strain D3C. The results confirmed the presence of amoB2 exclusively in strain D3C, but not in NF5, and further Sanger sequencing of PCR products yielded amoB/amoB1 and amoB2 sequences identical to those retrieved from the assembled genomes. amoB1-98F 5’- ATGCACACGGTGTCCAAGCAC-3’ amoB1-447R 5’-TGTTTGACCTGGACCGAGTC-3’ amoB2-50F 5’-TCCTGTTATCTAGCGTGT-3’ amoB2-143F 5’-CATTTGATAAACATCGCATGC-3’ amoB2-381R 5’-TATGTGATACGTTCCAGG-3’ amoB2-534R 5’-ACCAAGTGCAATAATACACC-3’ Supplementary references Armougom F, Moretti S, Poirot O, Audic S, Dumas P, Schaeli B et al (2006). Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee. Nucleic Acids Res 34: W604-608. Blankenberg D, Taylor J, Schenck I, He J, Zhang Y, Ghent M et al (2007). A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly. Genome Res 17: 960-964. Cannone JJ, Subramanian S, Schnare MN, Collett JR, D'Souza LM, Du Y et al (2002). The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics 3: 2. Chang JM, Di Tommaso P, Notredame C (2014). TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction. Mol Biol Evol 31: 1625-1637. Collingridge PW, Kelly S (2012). MergeAlign: improving multiple sequence alignment performance by dynamic reconstruction of consensus multiple sequence alignments. BMC Bioinformatics 13: 117. Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005). ProbCons: Probabilistic consistencybased multiple sequence alignment. Genome Res 15: 330-340. Edgar RC (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792-1797. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010). New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59: 307-321. Käll L, Krogh A, Sonnhammer EL (2007). Advantages of combined transmembrane topology and signal peptide prediction--the Phobius web server. Nucleic Acids Res 35: W429-432. Katoh K, Standley DM (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30: 772-780. Kemena C, Notredame C (2009). Upcoming challenges for multiple sequence alignment methods in the high-throughput era. Bioinformatics 25: 2455-2465. Lassmann T, Sonnhammer EL (2005). Automatic assessment of alignment quality. Nucleic Acids Res 33: 7120-7128. Le SQ, Gascuel O (2008). An improved general amino acid replacement matrix. Mol Biol Evol 25: 1307-1320. Liu Y, Schmidt B, Maskell DL (2010). MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities. Bioinformatics 26: 1958-1964. Minh BQ, Nguyen MA, von Haeseler A (2013). Ultrafast approximation for phylogenetic bootstrap. Mol Biol Evol 30: 1188-1195. Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ (2015). IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32: 268-274. Notredame C, Higgins DG, Heringa J (2000). T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol 302: 205-217. Pei J, Tang M, Grishin NV (2008). PROMALS3D web server for accurate multiple protein sequence and structure alignments. Nucleic Acids Res 36: W30-34. Rice P, Longden I, Bleasby A (2000). EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16: 276-277. Roshan U, Livesay DR (2006). Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22: 2715-2721. Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W et al (2011). Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7: 539. Tavaré S (1986). Some probabilistic and statistical problems in the analysis of DNA sequences. American Mathematical Society: Lectures on Mathematics in the Life Sciences 17: 57-86. Teira E, Reinthaler T, Pernthaler A, Pernthaler J, Herndl GJ (2004). Combining catalyzed reporter deposition-fluorescence in situ hybridization and microautoradiography to detect substrate utilization by bacteria and Archaea in the deep ocean. Appl Environ Microbiol 70: 4411-4414. Supplementary figures Figure S1. CARD-FISH images of strains NF5 (left) and D3C (right) using Archaea specific probes (a, b) and Bacteria specific probes (c, d). Specific probes = green, DAPI counterstaining = blue. Figure S2. Phylogenetic tree of amoA genes from cultivated Thaumarchaeota, including strains D3C and NF5. Gene fragments (594 bp) were aligned on protein level with EXPRESSO, based on structural information and several alignment methods combined, followed by reverse-translation with tranalign and filtering of unreliable positions with TCS. The tree was calculated by maximum likelihood with IQ-Tree based on the GTR+I+Γ4 model, with ultrafast bootstrap (UFBoot) and SH-aLRT support values each inferred from 1000 replicates (see SI materials and methods for details). Support values ≥85% are represented on the respective branches by semi-circles color-coded as indicated in the figure. Figure S3. Temperature (A) and pH (B) optima of strains NF5 (white circles) and D3C (black circles) in medium containing 1 mM NH4Cl. The pH was adjusted to 7.1 with HEPES buffer and did not change during incubations. Error bars represent standard deviations of measurements from triplicate cultures. Figure S4. Genome plots of “Ca. N. adriaticus” NF5 (left) and “Ca. N. piranensis” D3C (right). Unique genomic regions are calculated using an amino acid identity cut-off of <40%. Figure S5. Alignment of the AmoB2 from “Ca. N. piranensis” D3C with other available thaumarchaeal AmoB sequences, including the partial protein sequence of “Ca. Nitrosocaldus yellowstonii” HL72 available (EU239961). Predicted signal peptides are not shown. Boxes indicate the conserved histidine residues that coordinate the copper active site. Residues conserved in all thaumarchaeal AmoB protein sequences are shaded in grey; motifs and single amino acid residues unique to the AmoB2 of “Ca. N. piranensis” D3C are highlighted in black. Figure S6. Phylogenetic tree of amoB genes from Thaumarchaeota, including strains D3C and NF5, organisms with sequenced genomes, single-cell amplified genomes and metagenomic sequences. Full-length protein sequences were aligned on protein level with EXPRESSO based on structural information and several alignment methods combined; followed by reverse-translation with tranalign and low-stringency filtering of unreliable positions with TCS (564 nucleotide positions in the final alignment). The tree was calculated by maximum likelihood with IQ-Tree based on the GTR+Γ4 model, with ultrafast bootstrap (UFBoot) and SH-aLRT support values each inferred from 1000 replicates (see SI materials and methods for details). Support values ≥85% are represented on the respective branches by semi-circles color-coded as indicated in the figure. Figure S7. Urea cluster (A) and motility cluster (B) of “Ca. N. piranensis” D3C and “Ca. N. adriaticus” NF5, respectively, in comparison with other enrichment cultures. Csym: “Ca. Cenarchaeum symbiosum” A, Nsed: “Ca. Nitrosopumilus sediminis” AR2, Nlim: “Ca. Nitrosoarchaeum limnia” SFB1, Ngar: “Ca. Nitrososphaera gargensis” Ga9.2 Figure S8. Phylogenetic tree of the cryptochrome/photolyase protein family (CPF) showing the putative photolyases from strains D3C and NF5. The phylogeny includes representative proteins from all phyla with photolyase homologs sharing >40% amino acid identity with those of strains D3C and NF5 (sequences named after organisms and empty collapsed clades). Shaded areas highlight the Thaumarchaeota-specific clade (dark grey) and the associated broader lineage (light grey) including the CPD I photolyase of S. tokodaii 7 (bold; see main text). Full-length protein sequences were aligned with several template-based and de novo methods combined with MergeAlign, followed by lowstringency filtering of unreliable positions with TCS (398 amino acid positions in the final alignment). The tree was calculated by maximum likelihood with IQ-Tree based on the LG+I+Γ4 model, with ultrafast bootstrap (UFBoot) and SH-aLRT support values each inferred from 1000 replicates (see SI materials and methods for details). Support values ≥85% are represented on the respective branches by semi-circles color-coded as indicated in the figure.