1 2 Supporting Information for 3 4 5 Shifts in microbial community composition and function in the acidification process of a 6 lead/zinc mine tailings 7 8 9 10 Lin-xing Chen†, Jin-tian Li†, Ya-ting Chen, Li-nan Huang, Zheng-shuang Hua, Min Hu, Wen-sheng Shu 11 12 State Key Laboratory of Biocontrol and Guangdong Key Laboratory of Plant Resources, 13 School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, PR China 14 15 †These authors contributed equally to this work. 16 17 18 19 20 21 22 23 24 25 1 26 Supplementary Methods 27 DNA extraction, PCR and 454 pyrosequencing 28 The genomic DNA was extracted from each tailings subsample with a modified indirect DNA 29 extraction protocol as described previously (Tan et al., 2008). Briefly, cells were recovered 30 from about 20 g tailings by centrifugation at 900×g at 4ºC for 10 min, using 20 mL sodium 31 pyrophosphate (pH 3.0 or pH 7.0) as dispersal reagent (Duarte et al., 1998), then the 32 supernatant was collected. This recovery step was repeated twice. The collected supernatant 33 was centrifugated at 10,000×g at 4ºC for 15 min to pellet the cells, then the supernatant was 34 removed. The cell pellets obtained were treated with 20 mL of 0.3 M ammonium oxalate (pH 35 3.0 or pH 7.0) for 20 min to dissolve most of the iron precipitate (McKeague and Day, 1966), 36 followed by centrifugation at 10,000×g at 4ºC to pellet the cells, the supernatant was removed 37 and this step was repeated until the supernatant turned colorless. DNA from the cell pellets 38 was extracted with a FastDNA Kit for soil (Qbiogene Inc., Carlsbad, CA) following the 39 manufacturer’s instructions. The universal primer set 515F/806R (Bates et al., 2010) was used 40 to amplify the bacterial and archaeal 16S rRNA genes simultaneously, with an 8-bp barcode 41 specific to tailings subsample on the primer 806R. The primer sequences were as follows: (i) 42 CGTATCGCCTCCCTCGCGCCATCAGCAGTGCCAGCMGCCGCGGTAA, the underlined 43 sequence is the Link Primer Sequence, the ‘CA’ in blue is the two-base protecting sequence 44 on the forward primer sequence, the sequence in green is the primer 515F; (ii) 45 CTATGCGCCTTGCCAGCCCGCTCAGAACGAACGTCGGACTACVSGGGTATCTAAT, 46 the underlined sequence is the Link Primer Sequence, the 8-bp sequence in red is the barcode 47 sequence specific to tailings subsample (see Table S2 for all the barcodes), the ‘TC’ in blue is 48 the two-base protecting sequence on the reverse primer sequence, the sequence in green is the 49 primer 806R. PCR reactions (30 µL) contained 0.75 units Ex Taq DNA polymerase (TaKaRa, 50 Dalian, China), 1× Ex Taq loading buffer (TaKaRa, Dalian, China), 0.2 mM dNTP mix 2 51 (TaKaRa, Dalian, China), 0.2 µM of each primer and about 100 ng template DNA. PCR 52 amplification was conducted according to the procedure as follows: initial denaturation at 53 95ºC for 3 min; 35 cycles of denaturation at 94ºC for 30 s, primer annealing at 50ºC for 1 min, 54 extension at 72ºC for 1 min; a final extension of 10 min at 72ºC. For each tailings subsample, 55 the PCR reaction was conducted in triplicate and the products were pooled to mitigate PCR 56 amplification biases. The composite sample for pyrosequencing was created by combining 57 equimolar ratios of amplification products from individual subsamples as described by Fierer 58 et al. (2008), followed by gel purification using QIAquick Gel Extraction Kit (Qiagen, 59 Chatsworth, CA). The purified composite DNA sample was sent to Macrogen Inc. (Seoul, 60 Korea) for pyrosequencing on a 454 GS FLX Titanium pyrosequencer (Roche 454 Life 61 Sciences, Branford, CT, USA). 62 63 Processing of 454 pyrosequencing data 64 Pyrosequencing data analysis was performed with version 1.26 of the mothur software 65 package (Schloss et al., 2009) as described by Schloss et al. (2011). Given the inflation of 66 biodiversity estimate of sequences from 454 pyrosequencing (Kunin et al., 2010), the 67 sequences were denoised using the commands of ‘shhh.flows’ (translation of PyroNoise 68 algorithm; Quince et al., 2009) and ‘pre.cluster’ (Huse et al., 2010). Additionally, the 69 chimeric sequences were identified and removed using Chimeric Uchime (Edgar et al., 2011). 70 We also removed the sequences with: (i) a sequence length < 280 bp; and/or (ii) eight or more 71 homopolymers; and/or (iii) one or more ambiguous bases. The OTUs were identified at the 72 sequence identity level of 97% using the ‘cluster’ command with the average clustering 73 algorithm (Huse et al., 2010). Subsequently, a representative sequence was selected from each 74 OTU and the taxonomic assignment was achieved using the Ribosomal Database Project 75 (RDP) Classifier (Wang et al., 2007) with a minimum confidence of 80%. The alpha 3 76 microbial biodiversity of the 18 tailings subsamples was estimated by the abundance-based 77 indices of Chao1, Shannon and Simpson. 5,000 quality sequences were randomly sampled 78 (iterations, 10) from each of the 18 tailings subsample, and the average value of each tailings 79 sample was calculated based on the values of corresponding three tailings subsamples. 80 81 Metagenomics sequencing and analysis 82 Library construction and random shotgun sequencing. For T2 and T6 tailings samples, 83 genomic DNA extracted from the three subsamples of each sample were pooled and purified 84 with gel electrophoresis. The purified DNA samples were then sent to BGI Inc. (Shenzhen, 85 China) for shotgun library construction and Illumina sequencing. For both samples, whole 86 genome shotgun sequencing libraries with insert size of 180 bp were generated, then were 87 paired-end sequenced (90 bp × 2) by Illumina’s HiSeq (2000) platform. 88 89 Artifact filtering and quality control. The raw Illumina sequence data (2 GB for each 90 metagenome) were passed several filtering and control steps to obtain clean sequence data as 91 follows: (i) the reads with adapter contamination were identified and removed; (ii) the 92 duplicates were identified and removed; (iii) for the non-duplicate reads, the reads contain 93 more than 18 N were identified and removed; and (iv) the retained reads were trim at the 3’ 94 end to remove the bases with a quality score of < 20, and the reads with over 20% of 95 low-quality (quality score < 20) bases were also removed. The obtained clean reads were used 96 for further analysis. 97 98 Whole metagenome assembly. The clean reads were de novo assembled using velvet (version 99 1.1.04) (Zerbino and Birney, 2008), using options ins_length = 180, exp_cov = auto. We tried 100 to assembly both metagenomes using options k from 21 to 55, then the best assembly results 4 101 were selected based on the length of N50 contig and longest contig. As a result, the best 102 k-mer value for T2 metagenome was 45 (N50 contig: 522 bp; longest contig: 60233 bp), and 103 that value for T6 metagenome was 51 (N50 contig: 955 bp; longest contig: 40620 bp). 104 105 Microbial community composition analysis. Two strategies were employed to reveal the 106 microbial composition of T2 and T6 metagenomes: (i) The 16S rRNA genes were identified 107 using BLASTn against the RDP database (release 10) (Cole et al., 2009) from all the contigs 108 (e-value threshold = 10-5), and the taxonomic assignment of the identified 16S rRNA with the 109 anchors ≥ 100 bp was achieved using the RDP Classifier with a minimum confidence of 80%; 110 and (ii) the contigs (≥ 300bp) were compared against the National Center for Biotechnology 111 Information (NCBI) non-redundant (nr) database (e-value threshold = 10-5), then the contigs 112 were classified into taxonomic groups with the lowest ancestor algorithm in MEGAN (Huson 113 et al., 2011) with default parameters (minimum score, 35; minimum support, 1; top percent, 114 10%). 115 116 Gene prediction and functional annotation. The contigs had reliable NCBI-nr hits, as indicated 117 by MEGAN, were extracted for further analysis. The obtained contigs were subject to gene 118 prediction using Genemark with default parameters (Zhu et al., 2010), which yielded 51981 119 and 49538 putative protein-coding genes for T2 and T6 metagenome, respectively (Table S5). 120 We then compared these putative protein-coding genes against the NCBI-nr database, and the 121 ones with NCBI-nr hits were further compared against the Kyoto Encyclopedia of Genes and 122 Genomes (KEGG) database, and the Clusters of Orthologous Groups of proteins (COG) 123 database, using BLASTx (e-value threshold = 10-5). 124 125 Genome binning. Based on the contigs blasting results and MEGAN analysis (minimum score, 5 126 35; minimum support, 1; top percent, 10%), the dominating genus in T2 and T6 metagenomes 127 were binned. As a result, the information of the largest bins is shown in Table S6. 128 129 Contigs coverage estimate. For the coverage estimate of contigs, we firstly aligned the clean 130 reads used for assembly to the contigs using SOAPAligner (Li et al., 2009), three steps were 131 then conducted: (i) the index were built using all the contigs from assembly results 132 (2bwt-builder); (ii) align clean reads against the contigs based index (soap); and (iii) the 133 SOAP.COVERAGE (Li et al., 2009) was used to parse the output file of SOAPAligner. The 134 coverage estimate of contigs is shown in Fig. S7. 135 136 The functional abundance profile analysis of COG catalogues and COG categories 137 Based on the COG blast results, the predicted genes with reliable COG blast hits were 138 assigned to COG catalogues and COG categories (if available). To determine whether a 139 specific COG catalogue or COG category was enriched in our metagenomes, the odds ratio 140 for a specific COG catalogue or COG category against that in all sequenced bacteria and 141 archaea was calculated as follows. 142 COG catalogue (or COG category) odds_ratio A/B C/D 143 144 145 146 147 148 149 Where: A = No. of genes assigned to a specific COG catalogue (or COG category) in metagenome T2 (or T6) B = No. of genes assigned to all other COG catalogues (or COG categories) in metagenome T2 (or T6) C = No. of genes assigned to a specific COG catalogue (or COG category) in all 6 150 sequenced bacteria and archaea D = No. of genes assigned to all other COG catalogues (or COG categories) in all 151 152 sequenced bacteria and archaea 153 154 The values for ‘C’ and ‘D’ were obtained from the Integrated Microbial Genomes (IMG) 155 system (http://img.jgi.doe.gov/cgi-bin/w/main.cgi; Markowitz et al., 2012). The P-value was 156 calculated for each odds ratio using one-tailed Fisher’s exact test within the R statistical 157 computing environment (version 2.9.2) to identify significant deviations from equilibrium 158 (odds ratio = 1), according to Hemme et al. (2010). The values of odds ratio for COG 159 categories were translated through ln (odds ratio) and plotted, to gain a visualized positive or 160 negative trend (Fig. S6). Detailed information of selected COGs is provided in Table S7. 161 162 Functional abundance profile analysis of KEGG enzymes 163 Based on the KEGG blast results, the putative protein-coding genes with KEGG hits were 164 assigned to enzymes and KEGG pathways (if available), and the metabolic pathways were 165 constructed for T2 and T6 metagenomes. To get a better characterization of the metabolic 166 capabilities of the two metagenomes, the odds ratio of each enzyme was calculated as 167 follows: Enzyme 168 169 170 171 odds_ratio A/B C/D Where: A = No. of genes corresponding to a specific enzyme of a KEGG pathway in metagenome T2 (or T6) 172 B = No. of genes of all other enzymes of KEGG pathways in metagenome T2 (or T6) 173 C = No. of genes corresponding to a specific enzyme of a KEGG pathway in all 174 sequenced bacteria and archaea 7 D = No. of genes of all other enzymes of KEGG pathways in all sequenced bacteria 175 176 177 and archaea The values for ‘C’ and ‘D’ were obtained from the IMG system 178 (http://img.jgi.doe.gov/cgi-bin/w/main.cgi) (Markowitz et al., 2012). The P-value was 179 calculated for each odds ratio using one-tailed Fisher’s exact test within the R statistical 180 computing environment (version 2.9.2) to identify significant deviations from equilibrium 181 (odds ratio = 1). 182 183 Supplementary References 184 Bates, S., Berg-Lyons, D., Caporaso, J., Walters, W., Knight, R., and Fierer, N. (2010) 185 Examining the global distribution of dominant archaeal populations in soil. ISME J 5: 186 908–917. 187 Cole, J.R., Wang, Q., Cardenas, E., Fish, J., Chai, B., Farris, R.J., et al. (2009) The Ribosomal 188 Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids 189 Res 37: D141–D145. 190 Duarte, G.F., Rosado, A.S., Seldin, L., KeijzerWolters, A.C., and van Elsas, J.D. (1998) 191 Extraction of ribosomal RNA and genomic DNA from soil for studying the diversity of 192 the indigenous bacterial community. J Microb Methods 32: 21–29. 193 Edgar, R.C., Haas, B.J., Clemente, J.C., Quince, C., and Knight, R. (2011) UCHIME 194 improves sensitivity and speed of chimera detection. Bioinformatics 27: 2194–2200. 195 Fierer, N., Hamady, M., Lauber, C.L., and Knight, R. (2008) The influence of sex, handedness, 196 and washing on the diversity of hand surface bacteria. Proc Natl Acad Sci USA 105: 197 17994–17999. 198 Hemme, C.L., Deng, Y., Gentry, T.J., Fields, M.W., Wu, L., Barua, S., et al. (2010) 199 Metagenomic insights into evolution of a heavy metal-contaminated groundwater 8 200 microbial community. ISME J 4: 660–672. 201 Huse, S.M., Welch, D.M., Morrison, H.G., and Sogin, M.L. (2010) Ironing out the wrinkles in 202 the rare biosphere through improved OTU clustering. Environ Microbiol 12: 1889–1898. 203 Huson, D.H., Mitra, S., Ruscheweyh, H.J., Weber, N., and Schuster, S.C. (2011) Integrative 204 analysis of environmental sequences using MEGAN4. Genome Res 21: 1552–1560. 205 Kunin, V., Engelbrektson, A., Ochman, H., and Hugenholtz, P. (2010) Wrinkles in the rare 206 biosphere: pyrosequencing errors lead to artificial inflation of diversity estimates. 207 Environ Microbiol 12: 118–123. 208 Li, R., Yu, C., Li, Y., Lam, T.W., Yiu, S.M., Kristiansen, K., and Wang, J. (2009) SOAP2: 209 An improved ultrafast tool for short read alignment. Bioinformatics 25: 1966–1967. 210 Markowitz, V.M., Chen, I.M., Palaniappan, K., Chu, K., Szeto, E., Grechkin, Y., et al. (2012) 211 IMG: the Integrated Microbial Genomes database and comparative analysis system. 212 Nucleic Acids Res 40: D115–D122. 213 214 McKeague, J.A., and Day, J.H. (1966) Dithionite- and oxalate-extractable Fe and Al as aids in differentiating various classes of soils. Can J Soil Sci 46: 13–22. 215 Quince, C., Lanzen, A., Curtis, T.P., Davenport, R.J., Hall, N., Head, I.M., et al. (2009) Noise 216 and the accurate determination of microbial diversity from 454 pyrosequencing data. Nat 217 Methods 6: 639–641. 218 Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister, E.B., et al. (2009) 219 Introducing mothur: open-source, platform-independent, community-supported software 220 for describing and comparing microbial communities. Appl Environ Microbiol 75: 221 7537–7541. 222 Tan, G.L., Shu, W.S., Hallberg, K.B., Li, F., Lan, C.Y., Zhou, W.H., et al. (2008) Culturable 223 and molecular phylogenetic diversity of microorganisms in an open-dumped, extremely 224 acidic Pb/Zn mine tailings. Extremophiles 12: 657–664. 9 225 Wang, Q., Garrity, G., Tiedje, J., and Cole, J. (2007) Naïve Bayesian classifier for rapid 226 assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol 227 73: 5261–5267. 228 229 230 231 Zerbino, D.R., and Birney, E. (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18: 821–829. Zhu, W., Lomsadze, A., and Borodovsky, M. (2010) Ab initio gene identification in metagenomic sequences. Nucleic Acids Res 38: e132. 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 10 250 Supplementary Figures 251 252 253 Fig. S1. A map showing the tailings surface with the six sampling sites (T1-T6, see main text 254 for more details). 11 255 256 Fig. S2. The relative content of inorganic sulfur compounds (A) and ferric iron (B) in the six 257 tailings samples. The results for inorganic sulfur compounds were presented based on sulfur 258 as revealed by XPS. The results for ferric iron were presented as the quotient of ferric iron 259 concentration to total iron concentration in each of the samples. The bars showed the standard 260 errors of the relative abundance of three subsamples for each tailings sample. Different 261 lower-case letters above the bars indicated that the values were significantly different (P < 262 0.05, LSD). 12 263 264 Fig. S3. Rarefaction curves showing the microbial biodiversity of the six tailings samples. 265 OTUs (operational taxonomic units) were defined at the sequence identity level of 97%. For 266 each tailings subsample, 5000 quality sequences were randomly selected to calculate the 267 number of the OTUs (iterations, 10). The average values of the three tailings subsamples were 268 then calculated to represent the value of the corresponding tailings sample. 269 270 271 272 273 274 275 276 13 277 278 Fig. S4. Multivariate regression tree (MRT) showing the primary physicochemical 279 characteristics affecting the microbial community composition of the six tailings samples. The 280 physicochemical characteristics used for analysis included moisture content, pH, EC, TOC, 281 TN, T-Fe and T-S. Mois, moisture content; EC, electrical conductivity; TOC, total organic 282 carbon; TN, total nitrogen; T-Fe, total iron; T-S, total sulfur. 283 284 285 286 14 287 288 Fig. S5. The microbial community composition at the phylum level as revealed by MEGAN 289 (A) and 16S rRNA gene analysis (B). The 16S rRNA gene fragments from the metagenomes 290 were identified using BLASTN against the RDP database (e-value threshold = 10-5). The 291 taxonomic assignment of the identified 16S rRNA anchors ≥ 100 bp was achieved using the 292 RDP Classifier with a minimum confidence of 80%. 293 294 295 296 297 15 298 299 300 Fig. S6. The odds ratio of specific COG categories of metagenome T2 (A) and T6 (B) 301 compared to that of all sequenced bacteria and archaea. The values of odds ratio for COG 302 categories were translated through ln (odds ratio) and plotted, to gain a visualized positive and 303 negative trend. Asterisks indicate significant deviation from the null hypothesis (odds ratio = 304 1) at the 95% confidence level by one-tailed Fisher exact test. 305 306 307 308 309 310 311 312 313 314 16 315 316 317 Fig. S7. The distribution of coverage for contigs of T2 (A) and T6 (B) metagenomes. The 318 quality sequencing reads were firstly mapped to the contigs, and then the average depth of 319 each contig was calculated. 320 17 Supplementary Tables Table S1. Concentrations (mg kg-1) of the heavy metals in the six tailings samples. Tailings Zn Pb Mn Cr Cd Hg As Cu T1 52906 ± 4480b 12830 ± 313b 1376 ± 32b 75 ± 2a 13 ± 0.3b 10 ± 1b 1197 ± 30a 389 ± 25a T2 T3 T4 T5 T6 122418 ± 29030a 11429 ± 2904c 9235 ± 2309c 13035 ± 706c 9461 ± 1293c 6811 ± 898c 16323 ± 2102a 10936 ± 811b 5858 ± 582c 6813 ± 652c 1896 ± 212a 143 ± 43d 112 ± 13d 181 ± 5d 586 ± 70c 38 ± 2bc 36 ± 1c 35 ± 1c 22 ± 1d 40 ± 1b 28 ± 5a 3.4 ± 1.1c 2.4 ± 0.5c 3.0 ± 0.2c 7.9 ± 2.0bc 16 ± 4ab 19 ± 5a 10 ± 1ab 7.4 ± 0.4b 11 ± 1ab 1182 ± 80a 459 ± 127b 238 ± 4bc 210 ± 6c 1116 ± 85a 109 ± 19b 20 ± 4c 36 ± 8c ND 106 ± 3b Mean ± SE are shown. ND, not detected. In each column, values followed by different lower-case letters were significantly different (P < 0.05, LSD). 18 Table S2. No. of quality sequences in the six tailings samples a. Subsamples Barcode sequences No. of quality sequences T1-1 T1-2 T1-3 AACGAACG AACGAAGC AACGATCC 11164 8139 7634 T2-1 T2-2 T2-3 AAGCGCAA AAGCGCTT AAGCGGAT 6974 6517 5486 T3-1 T3-2 AAGCATCC AAGCATGG 8322 9788 T3-3 AACGGCTT 6942 T4-1 T4-2 T4-3 AACGCGAA AACGCGTT AACGGCAA 6479 6214 6634 T5-1 T5-2 T5-3 AACGTAGG AACGTTCG AACGTTGC 8262 8876 8817 T6-1 T6-2 T6-3 AAGGATGC AAGGCCAA AAGGCCTT 6495 6379 7033 a. The quality sequences met the criterions as follows: the minimal length, 280 bp; the maximal homopolymer, 8; and without any ambiguous base and no chimeric sequences. 19 Table S3. Microbial biodiversity of the six tailings samples revealed by pyrosequencing. Tailings samples T1 T2 T3 T4 T5 T6 OTUs 227 238 481 435 499 101 Chao1 547 433 610 775 805 195 Simpson 0.76 0.78 0.97 0.77 0.80 0.84 Shannon 3.3 4.0 6.5 4.5 4.6 3.4 OTUs were defined at 97% sequence identity level. For each tailings sample, 5000 sequences were randomly sampled from each of the three tailings subsample (iterations, 10), the average values of the three tailings subsamples were then calculated to represent the value of the corresponding tailings sample. 20 Table S4. The relative abundance (%) of dominating sequences pertaining to genus in the six tailings samples. Genus T1 T2 T3 T4 T5 T6 Acidithiobacillus Acinetobacter Amycolatopsis Brucella Comamonas Corynebacterium Enhydrobacter 0.30 0.22 0.04 0.22 5.1 0.03 0.10 0.02 0.12 0.07 0.12 0.08 0.00 0.04 1.0 7.1 1.8 6.5 0.87 5.4 1.9 0.47 4.8 0.25 1.5 0.34 0.23 0.37 0.25 3.0 0.51 1.8 0.44 1.6 0.62 19 0.01 0.01 0.01 0.01 0.00 0.01 Ferroplasma Gemmatimonas Hydrogenophaga Legionella Leptospirillum Methylobacterium Peredibacter Pseudomonas Rubrobacter 0.57 0.03 32 0.25 0.20 0.11 0.17 0.15 0.27 0.04 2.0 0.00 2.2 0.05 0.17 1.0 0.00 1.5 2.9 0.03 0.08 0.12 0.64 3.9 0.07 5.1 0.07 45 0.18 0.03 0.74 0.54 1.1 0.54 0.95 0.48 57 0.01 0.08 0.11 0.16 1.1 0.01 1.7 0.05 28 0.00 0.00 0.00 14 0.00 0.01 0.00 0.00 Sphingomonas Staphylococcus Streptococcus Sulfobacillus Thermogymnomonas Thiobacillus Thiovirga 0.22 0.08 0.04 0.12 0.01 12 26 1.9 0.00 0.00 0.03 0.00 3.3 0.02 1.6 11 1.1 0.27 0.06 0.08 0.16 1.0 0.34 0.21 0.07 0.01 1.5 0.07 0.65 2.8 0.45 0.09 0.02 0.07 0.06 0.00 0.00 0.00 13 4.3 0.00 0.00 If the genus related sequences with relative abundance > 1% in at least one tailings sample, then the genu was defined as dominating genus. The relative abundance of genus related sequences was calculated as the average value of three subsamples of each tailings sample. 21 Table S5. Summarized information of assembly, genes prediction and annotation of T2 and T6 metagenomes. Item T2 T6 Value Percentage Value Percentage 51071 100% 37765 100% Mean length (bp) 549 – 795 – Mean GC% 52 – 42 – N50 (bp) 522 – 955 – Longest (bp) 60233 – 40620 – No. with NCBI-nr a 38551 76% 30463 81% 51981 100% 49538 100% Mean length (bp) 386 – 451 – Mean GC% 52.5 – 43.2 – No. with NCBI-nr hits c 44853 86% 43587 88% No. with KEGG hits 42475 95% 36824 85% No. connected to KEGG Orthology (KO) 23336 52% 20966 48% No. connected to KEGG pathways 14166 32% 13034 30% No. with COG hits 32522 73% 32128 74% No. with COGs 31013 69% 29789 68% Contigs No. of total Putative protein coding genes b No. of total a. All the blasting comparison in this study was with the same criterion: e-value threshold = 10-5. b. The genes were predicted from the contigs with NCBI-nr hits, using MetaGene with default parameters. c. Only the putative protein-coding genes with NCBI-nr hits were further compared against the KEGG and COG databases, so that the subsequent calculation of percentage associated with KEGG and COG was based on the number of NCBI-nr hits. 22 Table S6. Binning information of contigs based on MEGAN results for T2 and T6 metagenomes. Contigs Metagenomes T2 T6 Domain Phylum Class Order Family Base pairs Genus No. Total % No. (bp) Total % Bacteria Proteobacteria Betaproteobacteria Hydrogenophilales Hydrogenophilaceae Thiobacillus 1435 3.7 827660 3.7 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiaceae Limnobacter 991 2.6 410968 1.9 Bacteria Actinobacteria Actinobacteria Rubrobacteridae Rubrobacteridae Rubrobacter 985 2.6 478579 2.2 Archaea Euryarchaeota Thermoplasmata Thermoplasmatales Ferroplasmaceae Ferroplasma 9082 30 7579534 30 Bacteria Nitrospirae Nitrospira Nitrospirales Nitrospiraceae Leptospirillum 5740 19 4413841 17 Bacteria Proteobacteria Gammaproteobacteria Acidithiobacillales Acidithiobacillaceae Acidithiobacillus 4318 14 2947378 12 The information of the largest three bins in T2 and T6 metagenomes are shown. This was obtained from the MEGAN analysis based on the blasting results of contigs against the NCBI-nr database (e-value threshold = 10-5). 23 Table S7. Summary of the specific COGs associated with heavy metals stress and low pH stress in T2 and T6. Stress COG ID COG category Gene COG information Heavy metals COG0598 COG2217 COG0672 COG0798 COG3696 COG0841 COG0845 COG1538 COG1230 COG0861 COG1275 COG2059 COG0474 COG2239 [P]* [P] [P] [P] [P] [V] [M] [MU] [P] [P] [P] [P] [P] [P] corA cadA FTR1 ACR3 czcA czcA czcB czcC czcD terC tehA chrA mgtA mgtE Mg2+ and Co2+ transporters Cation transport ATPase High-affinity Fe2+/Pb2+ permease Arsenite efflux pump ACR3 and related permeases Putative silver efflux pump Cation/multidrug efflux pump Membrane-fusion protein Outer membrane protein Co/Zn/Cd efflux system component Membrane protein TerC, possibly involved in tellurium resistance Tellurite resistance protein and related permeases Chromate transport protein ChrA Cation transport ATPase Mg/Co/Ni transporter MgtE (contains CBS domain) COG2216 COG2060 COG2156 COG1657 [P] [P] [P] [I] KdpB KdpA KdpC - High-affinity K+ transport system, ATPase chain B K+-transporting ATPase, A chain K+-transporting ATPase, c chain Squalene cyclase Low pH COG hits T2 T6 21 7 132 40 9 4 11 0 194 50 141 112 107 45 46 32 27 9 41 0 0 20 6 0 25 45 15 0 2 3 2 0 32 29 13 11 *Information on COG categories: [P] Inorganic ion transport and metabolism; [V] Defense mechanisms; [M] Cell wall/membrane/envelope biogenesis; [U] Intracellular trafficking, secretion, and vesicular transport; [I] Lipid transport and metabolism. 24