SUPPLEMENTARY MATERIAL for Metagenomic analysis on seasonal microbial variations of activated sludge from a full-scale wastewater treatment plant over four years Feng Ju, Feng Guo, Lin Ye, Yu Xia, Tong Zhang* Environmental Biotechnology Laboratory, Department of Civil Engineering, The University of Hong Kong, Hong Kong SAR Submitted to Environmental Microbiology Report *Corresponding author phone: +852-28578551; fax: +852-25595337; e-mail: zhangt@hkucc.hku.hk Supporting Information S1: Variation of environmental and operational parameters Tables Table S1 Characteristics of the environmental and operational parameters in Shatin WWTP from July 2007 to January 2011. Table S2 Illumina sequencing metadata of the eight activated sludge samples in this study. Table S3 The percentage abundances (%) of the major phyla (top 15 in each sample) in the 8 AS samples. Table S4 Abundances of the top 20 genera in each AS sample. Table S5 Percentages of the shared genera and their corresponding sequences. Table S6 The relative abundances of the 28 functional categories in four ecosystems based on SEED Subsystems implemented in MG-RAST. Figures Figure S1 Reproducibility of Illumina high-throughput sequencing at different sequencing depths. Figure S2 Heat map of major genera (top 20 in each sample) of the 8 AS samples. 1 Supporting Information S1: Variation of environmental and operational parameters During the sampling period, the water temperature of aerobic tank ranged from 13.7°C to 29.6°C, with seasonal lows (15.4±1.3 0C) in January (winter) and highs (29.1±0.5 0C) in July (summer) (Table S1). On the contrary, salinity (Cl-1/(mg.L-1) (from the seawater toilet-flushing practice in Hong Kong) fluctuated over the 4 years with peak concentrations in January and valley values in July. However, variation in concentrations of MLSS and nutrients like COD, NH4-N and total phosphorus (TP) followed mixed patterns in which their averaged concentrations in the last 2 years (Jul-07, Jan-08, Jul-08 and Jan-09) were higher than in the first 2 years (Jul-09, Jan-10, Jul-10 and Jan-11), while their concentrations in winter (January) were always higher than in summer (July). 2 Tables 12.7 3.3 8.9 231 1637 404 31 3.7 Jan-2008 AS08-1 15.9 5687 6.1 13.6 2.7 8.0 222 1963 389 37 4.2 Jul-2008 AS08-7 28.4 4115 4.7 11.2 3.3 6.6 281 1811 334 31 3.5 Jan-2009 AS09-1 15.3 5712 4.4 9.4 3.3 7.6 243 2061 347 33 4.6 Jul-2009 AS09-7 29.1 4584 5.2 10.9 3.9 7.7 232 1859 333 37 4.6 Jan-2010 AS10-1 16.8 6020 4.7 10.5 4.3 8.1 220 2216 478 42 5.2 Jul-2010 AS10-7 29.2 5229 11.3 25.2 3.9 7.8 233 2410 410 32 3 Jan-2011 AS11-1 13.7 5951 5.3 11.9 2.5 8.2 220 3281 648 50 8.2 TP /(mg.L-1) COD /(mg.L-1) 5.7 /(mg.L-1) MLSS /(g.L-1) 5038 TKN-N HRT / (hrs) 29.6 /(m3.d-1) DO / (mg/L) AS07-7 Flow rate MCRT / (d-1) Jul-2007 ( mg Cl-.L-1) name Salinity / time /(0C) Sample Temperature Sampling SRT / (d-1) Table S1 Characteristics of the environmental and operational parameters in Shatin WWTP from July 2007 to January 2011 DO: dissolved oxygen; SRT: sludge retention time; MCRT: mean cell retention time; HRT: hydraulic retention time; COD: chemical oxygen demand; MLSS: mixed liquor suspended solids; TKN: total Kjeldahl nitrogen. The monthly averaged values for each item were used for statistics analysis in the present study. 3 Table S2 Illumina sequencing metadata of the 8 activated sludge samples in this study PE1 reads after Sample ids normalization number length (bps) Tags2 obtained Number of de-replicated PE reads number Number of yield ratio mean length (%) (bps) 16/18S rRNA gene tags3 AS07-7 25,426,000 100 24,608,407 18,608,516 75.6 169 12,624 AS08-1 25,426,000 100 24,722,092 17,800,479 72.0 170 13,529 AS08-7 25,426,000 100 24,318,551 18,428,094 75.8 168 14,599 AS09-1 25,426,000 100 24,426,682 17,687,533 72.4 166 14,596 AS09-7 25,426,000 100 24,909,848 18,345,157 73.6 172 14,786 AS10-1 25,426,000 100 24,748,187 18,318,508 74.0 163 15,473 AS10-7 25,426,000 100 25,082,442 18,601,386 74.2 163 13,689 AS11-1 25,426,000 100 24,021,724 17,001,280 70.8 165 12,977 1) PE: paired-end; 2) Tags: the sequences obtained from overlapping the metagenomic paired-end reads, allowing a minimum overlap length of 10 bps; 3) 16/18S rRNA gene tags were identified by running BLAST against SSU-Ref database at an e-value cutoff of 1e-5. 4 Table S3 The relative abundances (%) of the major phyla (top 15 in each sample) in the 8 AS samples. The abundance is presented in terms of percentages of the total sequences (in a sample) that were assigned to phylum level, sorted firstly by domain and then abundance of each phylum. P1-ratio represents the ratio of average abundances in summer to winter samples. P2-ratio represents to ratio of AS08-1 AS08-7 AS09-1 AS09-7 AS10-1 AS10-7 AS11-1 Average P1-ratio P2-ratio Proteobacteria 40.38 39.76 38.06 38.10 41.02 41.85 44.11 42.93 40.78 2.2 1.0 0.9 Actinobacteria 21.14 29.35 19.89 33.54 15.03 23.29 15.36 17.73 21.92 6.6 0.7 1.5 Chloroflexi 7.84 6.04 9.79 7.91 12.52 11.99 16.42 12.26 10.60 3.3 1.2 0.6 Bacteroidetes 10.12 6.55 7.90 7.46 9.93 7.01 6.81 13.53 8.66 2.4 1.0 0.9 Firmicutes 4.28 6.03 6.98 4.84 3.52 4.56 2.99 5.29 4.81 1.3 0.9 1.4 Planctomycetes 6.18 5.60 4.38 4.26 4.06 3.20 3.46 2.76 4.24 1.2 1.1 1.5 Nitrospirae 4.95 1.55 4.34 0.05 2.85 1.35 1.52 1.17 2.22 1.7 3.3 1.6 Verrucomicrobia 1.13 1.91 0.96 1.70 0.87 1.67 1.04 0.96 1.28 0.4 0.6 1.3 Chlorobi 0.74 0.81 0.42 0.34 1.07 0.89 0.93 0.24 0.68 0.3 1.4 0.7 Cyanobacteria 0.63 0.27 0.14 0.14 0.48 0.15 0.48 0.15 0.31 0.2 2.4 0.9 Thermotogae 0.02 0.00 0.01 0.08 0.04 0.33 0.03 0.59 0.14 0.2 0.1 0.1 Acidobacteria 0.22 0.04 0.14 0.09 0.37 0.14 0.27 0.24 0.19 0.1 2.0 0.5 Chlamydiae 0.20 0.82 0.65 0.14 0.46 0.31 0.85 0.20 0.45 0.3 1.5 1.0 Spirochaetes 0.22 0.19 0.47 0.10 0.34 0.17 0.22 0.17 0.23 0.1 2.0 1.1 Lentisphaerae 0.03 0.07 0.13 0.19 0.14 0.14 0.25 0.18 0.14 0.1 0.9 0.6 Armatimonadetes 0.08 0.15 0.01 0.01 0.08 0.00 0.03 0.02 0.05 0.1 1.1 1.9 Rotifera 0.57 0.16 2.39 0.15 1.20 0.11 0.76 0.24 0.70 0.8 7.5 1.4 Nematoda 0.57 0.05 2.40 0.18 4.65 1.44 1.00 0.08 1.30 1.6 4.9 0.4 Arthropoda 0.08 0.12 0.06 0.16 0.03 0.06 0.01 0.05 0.07 0.0 0.5 2.8 Gastrotricha 0.03 0.00 0.01 0.06 0.07 0.20 0.03 0.00 0.05 0.1 0.5 0.3 Streptophyta 0.00 0.19 0.25 0.15 0.46 0.32 2.60 0.27 0.53 0.8 3.6 0.2 0.14 0.08 0.25 0.15 0.11 0.50 0.15 0.44 0.23 0.2 0.6 0.5 0.46 0.23 0.36 0.19 0.71 0.33 0.67 0.49 0.43 0.2 1.8 0.6 Eukaryota Bacteria Phyla Archaea Euryarchaeota Others Other assigned STD AS07-7 average abundances in the first-2-years to last-2-years samples. 5 Table S4 The relative abundances of the major genera (top 20 in each AS sample) in the 8 AS samples. The abundance is presented in terms of percentages of the total sequences in a sample that were assigned to genus level, sorted alphabetically by phylum and then genus. Average represents the averaged abundances of each genus in eight samples. P1-ratio represents the ratio of average abundances in summer to winter samples. P2-ratio represents to ratio of average abundances in the first-2-years to AS07-7 AS08-1 AS08-7 AS09-1 AS09-7 AS10-1 AS10-7 AS11-1 Average STD P1-ratio P2-ratio last-2-years samples. Mycobacteriu 10.6 13.3 10.3 8.28 8.82 8.00 6.00 6.16 8.94 2.4 1.0 1.5 m Candidatus 4.767 6.350 2.332 4.95 1.83 4.74 3.88 2.93 3.97 1.5 0.7 1.4 Microthrix Iamia 2.16 2.82 2.59 2.68 2.26 1.99 1.15 1.61 2.16 0.6 0.9 1.5 Nocardioides 1.08 1.52 1.74 1.80 0.72 1.99 0.58 2.88 1.54 0.7 0.5 1.0 Bifidobacteriu 0.88 1.69 0.79 1.17 0.72 1.17 0.53 1.46 1.05 0.4 0.5 1.2 m Tetrasphaera 0.64 4.98 0.92 5.57 0.83 3.34 0.43 2.57 2.41 2.0 0.2 1.7 Gordonia 0.08 0.46 0.13 6.96 0.48 2.63 4.60 0.56 1.99 2.6 0.5 0.9 Lewinella 2.40 1.20 2.29 0.88 1.75 1.39 2.73 0.96 1.70 0.7 2.1 1.0 Flexibacter 0.08 0.14 0.49 0.73 1.59 1.32 0.82 1.77 0.87 0.6 0.8 0.3 Chlamydiae Waddlia 0.04 0.04 0.07 0.07 0.00 0.08 1.15 0.10 0.19 0.4 4.4 0.2 Chloroflexi Caldilinea 5.72 4.73 5.90 3.08 4.49 3.87 3.84 3.03 4.33 1.1 1.4 1.3 Deferribacteres Caldithrix 1.60 0.74 0.82 0.40 1.79 1.50 4.51 1.41 1.60 1.3 2.2 0.4 Firmicutes Streptococcus 1.44 1.62 2.03 0.95 1.67 0.53 0.96 0.91 1.26 0.5 1.5 1.5 Clostridium 0.84 1.02 1.08 0.77 0.60 0.60 0.58 0.56 0.76 0.2 1.1 1.6 Blautia 0.68 1.09 1.97 1.06 0.52 1.17 0.58 0.86 0.99 0.5 0.9 1.5 Enterococcus 0.04 0.64 0.13 0.95 0.04 0.75 0.10 3.08 0.72 1.0 0.1 0.4 Nitrospirae Nitrospira 12.7 3.99 10.9 0.15 8.34 4.13 4.89 3.89 6.14 4.2 3.0 1.3 Opisthokonta Diplolaimella 0.009 0.00 3.607 0.00 9.73 2.78 2.40 0.00 2.31 3.3 5.7 0.2 Planctomycetes Planctomyces 8.95 8.12 4.26 6.30 4.93 4.77 3.45 2.78 5.44 2.2 1.0 1.7 Pirellula 1.56 1.48 2.03 1.39 0.75 0.83 0.91 0.86 1.23 0.5 1.2 1.9 Rhodopirellula 0.52 0.64 0.43 0.59 1.11 0.45 0.82 0.86 0.68 0.2 1.1 0.7 Amaricoccus 2.12 3.03 2.36 2.05 1.15 1.39 1.06 1.01 1.77 0.7 0.9 2.1 Rhodobacter 2.00 2.05 2.29 3.22 2.07 1.77 1.87 1.61 2.11 0.5 1.0 1.3 Nordella 1.64 2.36 0.82 0.77 0.32 0.23 0.14 0.40 0.84 0.8 0.8 5.1 Rhodobium 1.40 1.41 1.31 0.81 1.15 0.41 0.67 1.06 1.03 0.4 1.2 1.5 Paracoccus 1.32 1.41 1.18 2.38 0.79 1.88 0.72 1.51 1.40 0.5 0.6 1.3 Hyphomicrobi 1.04 0.88 0.33 0.70 0.32 0.45 0.48 0.30 0.56 0.3 0.9 1.9 um Methylosinus 0.84 1.02 0.62 0.84 0.56 0.71 0.34 0.81 0.72 0.2 0.7 1.4 Rhodovulum 0.76 0.67 0.79 1.32 0.75 0.83 0.77 0.45 0.79 0.2 0.9 1.3 Denitromonas 0.76 0.21 0.43 0.11 0.52 0.30 1.01 1.67 0.62 0.5 1.2 0.4 Haliea 0.76 0.14 0.33 0.33 0.72 1.62 1.53 3.18 1.08 1.0 0.6 0.2 Mesorhizobium 0.72 1.06 0.62 0.44 0.72 0.64 0.82 0.61 0.70 0.2 1.1 1.0 Haliangium 0.64 0.39 0.69 0.37 0.87 0.98 0.34 0.76 0.63 0.2 1.0 0.7 Phylum Actinobacteria Bacteroidetes Proteobacteria Genus 6 Thermotogae Azoarcus 0.64 0.18 0.49 0.22 1.47 0.26 2.30 1.56 0.89 0.8 2.2 0.3 Zoogloea 0.44 0.18 0.20 0.15 0.28 0.38 1.15 1.11 0.48 0.4 1.1 0.3 Nannocystis 0.36 0.14 1.11 0.18 0.68 0.94 1.01 0.76 0.65 0.4 1.6 0.5 Candidatus 0.36 0.35 0.26 1.39 0.72 1.73 0.19 0.56 0.69 0.6 0.4 0.7 Alysiosphaera Acidovorax 0.32 0.39 0.85 0.62 0.20 0.38 0.10 0.20 0.38 0.2 0.9 2.5 Nitrosomonas 0.16 0.00 0.23 0.07 1.11 0.41 1.15 0.35 0.44 0.4 3.2 0.2 Thauera 0.16 0.18 0.13 0.07 1.71 0.41 1.10 1.01 0.60 0.6 1.9 0.1 Hyphomonas 0.12 0.00 0.20 0.62 0.60 0.38 1.06 0.45 0.43 0.3 1.4 0.4 Candidatus 0.04 0.04 0.10 0.07 0.52 0.15 1.10 1.72 0.47 0.6 0.9 0.1 Thiobios Kosmotoga 0.00 0.00 0.03 0.22 0.12 1.01 0.10 1.97 0.43 0.7 0.1 0.1 7 Table S5 Percentages of the shared genera and their corresponding sequences. a Number of Number of shared sample b c Percentage in Number of shared Percentage in classified genera classified genera sequences sequences 8 100* 15.6 16169 79.4 7 144 22.4 17457 85.7 6 188 29.2 18234 89.5 5 233 36.2 18755 92.1 4 285# 44.3 19602 96.3 3 346 53.8 19865 97.6 2 441 68.6 20120 98.8 1 643 100.0 20363 100.0 * 100 (accounting for 79.4% of the classified sequences) out of the 643 assigned genera were shared by all 8 samples. # A total of 285 genera were commonly shared by more than 4 AS samples, accounting for 96.3% of all classified sequences at genus level. There were 202 rare genera that only appeared in one sample, accounting for only 1.2% of total sequences. a Number of AS samples, which share the genera in the second column. b Percentage of the number of shared genera in the number of total classified genera. c Percentage of sequences of the shared genera in the total classified sequences at genus level. 8 Table S6 The relative abundances of the 28 functional categories in four ecosystems based on SEED Subsystems implemented in MG-RAST. The results are presented using the mean value of all datasets in each ecosystem. RSD means relative standard deviation. AS Level 1 Soil Human faeces Ocean Total Mean RSD Mean RSD Mean RSD Mean RSD Mean RSD (%) (%) (%) (%) (%) (%) (%) (%) (%) (%) Clustering-based subsystems 15.63 0.54 15.48 1.89 15.43 6.93 15.39 3.25 15.48 3.85 Carbohydrates 10.16 1.58 10.95 4.16 12.9 11.18 7.94 7.98 10.49 18.71 Protein Metabolism 7.86 0.8 8.69 3.25 8.16 4.02 9.75 4.09 8.62 8.79 Amino Acids and Derivatives 8.57 1.64 8.23 7.3 7 6.51 7.54 9.89 7.84 10.24 Miscellaneous 8.83 1.72 7.53 7.07 7.81 11.31 9.42 2.33 8.40 11.41 Cofactors, Vitamins, Prosthetic Groups, Pigments 6.3 0.84 6.89 7.21 5.8 5.24 7.81 5.07 6.70 12.39 DNA Metabolism 3.41 3.00 3.55 13.18 4.48 5.94 3.58 7.32 3.76 13.91 RNA Metabolism 4.6 1.79 4.23 10.62 5.21 15.09 3.89 5.15 4.48 15.10 Membrane Transport 1.38 1.85 3.96 6.54 4.62 9.99 4.65 4.14 3.65 33.90 Respiration 3.65 2.2 3.39 7.28 3.69 12.6 2.59 6.75 3.33 15.74 Cell Wall and Capsule 4.02 3.31 3.13 13.93 2.03 21.18 2.9 13.44 3.02 25.88 Nucleosides and Nucleotides 2.87 4.18 2.75 24.57 2.72 9.08 1.58 5.75 2.48 26.25 Virulence, Disease and Defense 2.94 2.3 3.05 9.23 2.07 5.64 2.68 3.79 2.69 15.74 Fatty Acids, Lipids, and Isoprenoids 2.55 2.71 2.7 10.53 2.04 27.08 2.37 11.04 2.42 17.30 Stress Response 3.01 0.91 2.95 6.4 3.18 8.07 3.42 2.55 3.14 7.91 Phages, Prophages, Transposable elements, Plasmids 3.6 1.71 1.9 51.35 1.55 21.21 1.24 7.27 2.07 51.02 Metabolism of Aromatic Compounds 1.64 4.45 1.74 18.25 0.62 31.17 0.84 10.23 1.21 44.28 Regulation and Cell signaling 1.65 4.29 1.67 15.99 2.59 18.23 5.38 46.43 2.82 69.92 Cell Division and Cell Cycle 1.39 2.04 1.34 7.83 1.58 16.45 1.68 4.79 1.50 13.39 Nitrogen Metabolism 1.12 2.72 1.17 10.72 1.42 11.73 0.96 8.08 1.17 17.35 Sulfur Metabolism 1.38 5.16 1.04 14.74 0.83 29.33 0.78 17.15 1.01 27.60 Phosphorus Metabolism 0.79 9.66 0.87 19.08 0.59 30.24 0.53 10.23 0.70 27.98 Motility and Chemotaxis 0.9 2.02 0.84 8.92 0.82 7.31 0.67 7.57 0.81 12.42 Iron acquisition and metabolism 0.55 8.97 0.66 45.45 1.35 28.49 0.57 16.27 0.78 52.58 Secondary Metabolism 0.38 4.51 0.38 32.39 0.5 19.89 0.18 12.06 0.36 39.30 Potassium metabolism 0.47 2.67 0.41 12.2 0.34 29.88 0.41 7.93 0.41 18.09 Dormancy and Sporulation 0.22 2.76 0.26 17.36 0.64 25.73 0.22 27.04 0.34 59.09 Photosynthesis 0.12 8.05 0.23 24.52 0.05 39.3 1.02 43.08 0.36 124.9 9 . Figures a• b 600 100 400 AS08-7-1G/5G AS08-7-1G/5G 500 300 200 Linear fitting equation: Y=1.025X; R2=0.995 100 0 100 200 300 400 AS08-7-1G c 500 60 40 Linear fitting equation: 2 Y=1.029X; R =0.949 0 600 0 d 20 40 60 AS08-7-1G 80 100 400 AS08-7-5G/30G AS08-7-5G/30G 2000 80 20 0 350 300 AS08-7-5G/30G 1600 AS08-7-5G/30G AS08-7-1G/5G 120 AS08-7-1G/5G 1200 800 Linear fitting equation: 2 Y=X; R =0.985 400 250 200 150 100 Linear fitting equation: 2 Y=1.018X; R =0.989 50 0 0 0 400 800 1200 1600 AS08-7-5G 2000 0 50 100 150 200 250 300 350 AS08-7-5G Figure S1 Reproducibility of Illumina high-throughput sequencing at different sequencing depths. The results are based on taxonomic analysis of four data sets derived from three replicates of sample AS08-7. AS08-7-1G/5G and AS08-7-5G/30G represents random extraction of 1G and 5G data sets from original 5G and 30 G data sets of sample AS08-7, respectively. a and b stand for linear fitting of classes (a) and genera (b) abundances (with abundance >5) in the two 1G data sets; c and d stand for linear fitting of classes (c) and genera (d) abundances (with abundance >5) in the two 5G data sets. 10 Figure S2 Heat map of major genera (top 20 in each sample) of the 8 AS samples. The top 20 abundant genera in each sample were selected (a total of 43 genera for all 8 samples) and compared with their abundances (percentages) in other samples. The genus names in bold blue, purple and green represent those major genera with significantly-changed abundances across P1, P2 and P1P2 (both P1 and P2), respectively, based on two-way ANOVA analysis. The color intensity in each column displays the percentage of a genus in a sample, referring to color key on the left. 11