Supplementary material A metagenome of a full-scale microbial community carrying out Enhanced Biological Phosphorus Removal Mads Albertsen, Lea Benedicte Skov Hansen, Aaron Marc Saunders, Per Halkjær Nielsen, 5 and Kåre Lehmann Nielsen Department of Biotechnology, Chemistry and Environmental Engineering, Aalborg University, Sohngaardsholmsvej 49, DK-9000 Aalborg, Denmark Supplementary Text 1 of 1. 10 An investigation of the reads mapping to the ppk1 gene of Accumulibacter was conducted to evaluate the sensitivity and specificity of the reference assembly against the Accumulibacter genome. 87 ppk1 sequences were obtained from NCBI and five ppk1 genes of closely related species were included. All ppk1 sequences were trimmed to the length of the smallest ppk1 fragments (1073 bp) and 15 clustered using cdhit-est v.4.2.1 (Li and Godzik, 2006) with the following parameters; -c 0.99 –r 1. A BLAST database was created from the resulting 68 non-redundant sequences. The ppk1 sequences were assigned to different nodes in the phylogenetic tree using MEGAN. As MEGAN assigns reads to nodes based on the species information in the BLAST hits, the header of the individual ppk1 sequences were changed to reflect the topology of the 20 phylogenetic tree. The metagenomic reads that matched the extracted region of the ppk1 gene in the Accumulibacter genome in the original reference assembly, were extracted to investigate the specificity of the reference mapping (inclusion of other bacteria in the mapping). These sequences were matched to the ppk1 database using BLASTn with default parameters except 25 –word_size = 7, –outfmt 5 and –evalue 1e-5. The output was analysed in MEGAN. In order to investigate the sensitivity (inclusion of most Accumulibacter clades in the mapping) a reference assembly was conducted against the 68 Accumulibacter ppk1 genes using CLCs reference mapping function requiring min. 85% identity over 70% of the read length. Only reads with a minimum length of 60 bp were used. Otherwise the analysis was 30 conducted as the specificity analysis. 1 of 14 The high resolution of the diversity within the genus using the ppk1 gene was used as a test case to validate the specificity (false positive matches) and the sensitivity (ability to recruit reads from other Accumulibacter species) of the reference mapping. A total of 138 ppk1 genes were used to construct a phylogenetic tree (Figure S5) and the phylogenetic position of each 35 sequence was mimicked in MEGAN for the assignment of individual reads to different nodes on the tree. The specificity analysis showed that only 10 of the 268 ppk1 reads assigned to the Accumulibacter IIA str. UW-1 ppk1 gene had a better match to non-Accumulibacter ppk1 sequences (Figure S7A). However, the sensitivity analysis showed that although we were able to recruit most clade IIA ppk1 reads using the clade IIA str. UW-1 ppk1 gene, we were not 40 able to recruit more than approximately 30-50% of the reads from other Accumulibacter species (Figure S7C). 2 of 14 Supplementary Figure 1 of 7. 45 Supplementary Figure 1. Histogram of length distribution of the de novo assembled contigs. Contigs ≥ 300 bp were used for further analysis (blue bars). 3 of 14 50 Supplementary Figure 2 of 7. Supplementary Figure 2. Histogram of the length distribution of ORFs with a significant BLAST hit (e-value ≤ 1e-5) compared to ORFs where no significant hit could be found. The 55 “double curved” plots are due to the minimum contig size of 300 bp (100 amino acids). 4 of 14 Supplementary Figure 3 of 7. 60 Supplementary Figure 3. A) Species abundance curve. “Best hit” represent species assignment based on best BLASTP hit. “10% Bitscore filter” represent species assignment if the best BLASTP hit had a bitscore that is >10% higher than the second best BLASTP hit. The graph only shows species with more than 100 ORFs assigned (100 ORFs ≈ 0.05% of all 65 ORFs). B) Species abundance chart. The 20 most abundant species are shown in the legend in decreasing abundance. ORFs were assigned based on best BLASTP hit. C) Rarefaction curves. The rarefaction function of MEGAN was used to create rarefaction curves at different phylogenetic levels. The assignment is based on a 10% bitscore filter and minimum 5 ORFs assigned. 70 5 of 14 Supplementary Figure 4 of 7. Supplementary Figure 4. Annotation of ORFs in the largest contig (32884 bp). A yellow 75 ORF denote a significant blast hit (e-value ≤ 1e-5) whereas brown denotes no significant hit. 6 of 14 Supplementary Figure 5 of 7. 80 Supplementary Figure 5. Phylogenetic tree of ppk1 sequences. Sequences from Aalborg East have been marked in red and the ppk1 sequence from “Candidatus Accumulibacter phosphatis” clade IIA str. UW-1 has been marked in blue. In addition clade assignments have been added. A putative new clade has been marked as IIx. The tree was first created on the basis of 87 general ppk1 genes and only selected representative sequences are shown in the 85 final tree. The outgroup sequences (not shown on the tree) were Ralstonia eutropha YP_300029, Ralstonia eutropha YP_729175 and Stenotrophomonas maltophilia K279a CAQ44540. 7 of 14 Supplementary Figure 6 of 7. 90 Supplementary Figure 6. Comparison of genes prevalent in the different read pools based on a reference mapping to the Accumulibacter genome. The percent read length covered was used to compare presence or absence of genes. High identity reads (>95% identical at nt level, x-axis) was compared with the rest of the read pool (≤95% identical at nt level, y-axis). Each 95 dot represents one gene. In order to compare which genes that differed between the high (>95%) and low (≤95%) identity read pools, the read pool size of the low-identity group was normalized (by subsampling) to the same size (179 741 reads) as the high-identity read pool, thereby effectively comparing the prevalent genes in both read pools. 8 of 14 100 Supplementary Figure 7 of 7. Supplementary Figure 7. Investigation of the specificity and sensitivity of the mapping of metagenome reads to the genome of Accumulibacter clade IIA (NC_013194). MEGAN was used to visualize the BLASTn results. A 10% bitscore difference was used to assign reads to 105 nodes. A) Investigation of the specificity of the mapping of the metagenome reads to the Accumulibacter clade IIA ppk1 gene. The metagenome reads mapping to the clade IIA ppk1 gene were extracted and mapped to 68 non-redundant accumulibacter ppk1 genes and 5 ppk1 genes from closely related species. Few reads had best match to other species than Accumulibacter. B) Investigation of the ability to include other Accumulibacter clades by the 110 use of the Accumulibacter clade IIA genome. The metagenome reads were mapped to 68 nonredundant Accumulibacter ppk1 genes and the extracted read pool was searched (BLASTn) against all 68+5 ppk1 genes and visualised using MEGAN. C) The combination of panel A and B reveals that most clade IIA reads are extractable using the clade IIA genome, however only approximately 30% of reads matching other clades are extracted. 9 of 14 115 Supplementary Table 1 of 2. 10 of 14 Supplementary references for Table S1. 120 Crocetti GR., Hugenholtz P, Bond PL, Schuler A, Keller J, Jenkins D, Blackall LL (2000). Identification of polyphosphate-accumulating organisms and design of 16S rRNA-directed probes for their detection and quantitation. Appl Environ Microbiol 66:1175-1182. Daims H, Nielsen JL, Nielsen PH, Schleifer KH, Wagner M (2001). In situ characterization of Nitrospira-like nitrite- oxidizing bacteria active in wastewater treatment plants. Appl Environ Microbiol 67:5273-5284. 125 Daims H, Bruhl A, Amann R, Schleifer K-H,Wagner M (1999). The domain-specific probe EUB338 is insufficient for the detection of all bacteria: development and evaluation of a more comprehensive probe set. Syst Appl Microbiol 22: 434–444. Erhart R, Bradford D, Seviour RJ, Amann R, Blackall LL (1997). Development and use of fluorescent in situ hybridization probes for the detection and identification of ‘Microthrix parvicella’ in activated sludge. Systematic Appl Microbiol 20:310-318. 130 135 Flowers J, He S, Carvalho G, Peterson SB, Lopez C, Yilmaz S, Zilles JL, Morgenroth E, Lemos PC, Reis MAM, Crespo MTB, Noguera DR, McMahon KD (2008). Ecological differentiation of Accumulibacter in EBPR reactors. In: Proceedings of the Water Environment Federation, WEFTEC 2008 (12):31-42. Gieseke A, Purkhold U, Wagner M, Amann R, Schramm A (2001). Community structure and activity dynamics of nitrifying bacteria in a phosphate-removing biofilm. Appl Environ Microbiol 67:1351-1362. Giuliano L, De Domenico M, De Domenico E, Hofle MG, Yakimov MM (1999). Identification of culturable oligotrophic bacteria within naturally occurring bacterioplankton communities of the Ligurian sea by 16S rRNA sequencing and probing. Micro Ecol 37:77-85. 140 145 Hess A, Zarda B, Hahn D, Haner A, Stax D, Hohener P, Zeyer J (1997). In situ analysis of denitrifying toluene- and m-xylene-degrading bacteria in a diesel fuel-contaminated laboratory aquifer column. Appl Environ Microbiol 63:2136-2141. Hugenholtz P, Tyson GW, Webb RI, Wagner AM, Blackall LL (2001). Investigation of Candidate division TM7, a recently recognizedmajor lineage of the domain bacteriawith no known pure-culture representatives. Appl Environ Microbiol 67:411-419. Kanagawa T, Kamagata Y, Aruga S, Kohno T, Horn M, Wagner M (2000). Phylogenetic analysis of and oligonucleotide probe development for Eikelboom type 021N filamentous bacteria isolated from bulking activated sludge. Appl Environ Microbiol 66:5043-5052. 150 Kong YH, Beer M, Rees GN, Seviour RJ (2002). Functional analysis of microbial communities in aerobiceanaerobic sequencing batch reactors fed with different phosphorus/ carbon (P/C) ratios. Microbiology-Sgm 148:2299-2307. 11 of 14 Kong Y, Nielsen JL, Nielsen PH (2005). Identity and ecophysiology of uncultured actinobacterial polyphosphate- accumulating organisms in full-scale enhanced biological phosphorus removal plants. Appl Environ Microbiol 71:4076-4085. 155 160 165 Kragelund C, Levantesi C, Borger A, Thelen K, Eikelboom D, Tandoi V, Kong Y, Krooneman J, Larsen P, Thomsen TR, Nielsen PH (2008). Identity, abundance and ecophysiology of filamentous bacteria belonging to the Bacteroidetes present in activated sludge plants. Microbiology 154:886-894. Lajoie CA, Layton AC, Gregory IR, Sayler GS, Taylor DE, Meyers AJ (2000). Zoogloeal clusters and sludge dewatering potential in an industrial activated-sludge wastewater treatment plant. Water Environ Research 72:56-64. Levantesi C, Rossetti S, Thelen K, Kragelund C, Krooneman J, Eikelboom D, Nielsen PH, Tandoi V (2006). Phylogeny, physiology and distribution of ‘Candidatus Microthrix calida’, anew Microthrix species isolated from industrial activated sludge wastewater treatment plants. Environ Microbiol 8:1552-1563. Maixner F, Noguera DR, Anneser B, Stoecker K, Wegl G, Wagner M, Daims H (2006). Nitrite concentration influences the population structure of Nitrospira-like bacteria. Environ Microbiol 8:1487-1495. 170 Mobarry BK, Wagner M, Urbain V, Rittmann BE, Stahl DA (1996). Phylogenetic probes for analyzing abundance and spatial organization of nitrifying bacteria. Appl Environ Microbiol 62:2156-2162. Nguyen HTT, Le VQ, Hansen AA, Nielsen JL, Nielsen PH (2011). High diversity and abundance of putative polyphosphate-accumulating Tetrasphaera-related bacteria in activated sludge systems. FEMS Microbiol Ecol 76:256-267. 175 Rossello-Mora RA, Wagner M, Amann R, Schleifer KH (1995). The abundance of Zoogloea ramigera in sewage treatment plants. Appl Environ Microbiol 61:702-707. Schauer M, Hahn MW (2005). Diversity and phylogenetic affiliations of morphologically conspicuous large filamentous bacteria occurring in the pelagic zones of a broad spectrumof freshwater habitats. Appl Environ Microbiol 71:1931-1940. 180 185 Thomsen TR, Nielsen JL, Ramsing NB, Nielsen PH (2004). Micromanipulation and further identification of FISH-labelled microcolonies of a dominant denitrifying bacterium in activated sludge. Environ Microbiol 6:470-479. Trebesius K, Leitritz L, Adler K, Schubert S, Autenrieth IB, Heesemann J (2000). Culture independent and rapid identification of bacterial pathogens in necrotising fasciitis and streptococcal toxic shock syndrome by fluorescence in situ hybridisation. Medical Microbiol Immun 188:169-175. 12 of 14 Supplementary Table 2 of 2 190 Table S2 Selected reference genomes from Dinsdale et al., (2008b) used for comparison with the metagenome obtained in the current study. In addition a metagenome from a non-EPBR wastewater treatment plant was included (Sanapareddy et al., 2009). Metagenome Name Soudan Red Stuff Soudan Black Stuff Low Saltern microbes Medium Saltern Microbes (MB1110) Medium saltern microbes (MB1111) Low saltern pond plasmids (TT) High saltern microbial (HB1128) Salton Sea Bacteria 1 Medium salinity microbial (MB1116) Low salinity microbial (LB1128) Line Islands Kingman Reef B2 bacteria Line Islands Christmas Reef B3 bacteria Line Islands Palmyra F8 Bacteria DMSP 1 (MAM.1) DMSP 2 (MAM.2) VAN 2 (MAM 4) Tilapia pond microbes Healthy Tilapia pond microbes Healthy Prebead tank microbes Tpond microbe 3 Rios Mesquites Stromatolites bacteria Pozas Azule II stromatolite microbes Healthy slime bacteria Morbid slime bacteria Healthy gut bacteria Morbid gut bacteria Non-EBPR wastewater treatment plant Environment Subterranean Subterranean Hyper-Saline Hyper-Saline Hyper-Saline Hyper-Saline Hyper-Saline Hyper-Saline Hyper-Saline Hyper-Saline Marine Marine Marine Marine Marine Marine Freshwater Freshwater Freshwater Freshwater Microbialites Microbialites Fish Fish Fish Fish WWTP MG-RAST ID 4440281 4440282 4440437 4440435 4440434 4440090 4440419 4440329 4440425 4440426 4440037 4440041 4440039 4440364 4440360 4440363 4440440 4440413 4440411 4440422 4440060 4440067 4440059 4440066 4440055 4440056 N/A Reference Edwards et al., 2006 Edwards et al., 2006 Rodriguez-Brito et al 2009 Rodriguez-Brito et al 2009 Rodriguez-Brito et al 2009 Rodriguez-Brito et al 2009 Rodriguez-Brito et al 2009 Swan et al., 2010 Rodriguez-Brito et al 2009 Rodriguez-Brito et al 2009 Dinsdale et al., 2008a Dinsdale et al., 2008a Dinsdale et al., 2008a Mou et al., 2008 Mou et al., 2008 Mou et al., 2008 Rodriguez-Brito et al 2009 Rodriguez-Brito et al 2009 Rodriguez-Brito et al 2009 Rodriguez-Brito et al 2009 Breitbart et al., 2009 Desnues et al., 2008 Angly et al., 2009 Angly et al., 2009 Angly et al., 2009 Angly et al., 2009 Sanapareddy et al., 2009 Supplementary references for Table S1. 195 Angly FE, Willner D, Prieto-Davó A, Edwards RA, Schmieder R, Vega-Thurber R, et al. (2009). The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes. PLoS Computational Biology 5:e1000593. Breitbart M, Hoare A, Nitti A, Siefert J, Haynes M, Dinsdale E, et al. (2009). Metagenomic and stable isotopic analyses of modern freshwater microbialites in Cuatro Ciénegas, Mexico. Environ Microbiol 11:16-34. 13 of 14 200 Desnues C, Rodriguez-Brito B, Rayhawk S, Kelley S, Tran T, Haynes M, et al. (2008). Biodiversity and biogeography of phages in modern stromatolites and thrombolites. Nature 452:340-343. Dinsdale EA, Pantos O, Smriga S, Edwards RA, Angly F, Wegley L, et al. (2008a). Microbial ecology of four coral atolls in the Northern Line Islands. PloS one 3:e1584. 205 Dinsdale EA, Edwards RA, Hall D, Angly F, Breitbart M, Brulc JM, et al. (2008b). Functional metagenomic profiling of nine biomes. Nature 452:629–632. Edwards RA, Rodriguez-Brito B, Wegley L, Haynes M, Breitbart M, Peterson DM, et al. (2006). Using pyrosequencing to shed light on deep mine microbial ecology. BMC genomics 7:57. 210 Mou X, Sun S, Edwards RA, Hodson RE, Moran MA. (2008). Bacterial carbon processing by generalist species in the coastal ocean. Nature 451:708-711. Rodriguez-Brito B, Li L, Wegley L, Furlan M, Angly F, Breitbart M, et al. (2010). Viral and microbial community dynamics in four aquatic environments. The ISME J 4:739-751. 215 Sanapareddy N, Hamp TJ, Gonzalez LC, Hilger HA, Fodor AA, Clinton SM. (2009). Molecular diversity of a North Carolina wastewater treatment plant as revealed by pyrosequencing. Appl Environ Microbiol 75:1688-1696. Swan BK, Ehrhardt CJ, Reifel KM, Moreno LI, Valentine DL. (2010). Archaeal and bacterial communities respond differently to environmental gradients in anoxic sediments of a California hypersaline lake, the Salton Sea. Appl Environ Microbiol 76:757-768. 220 14 of 14