emi12114-sup-0001-si

advertisement
1
2
Supporting Information for
3
4
5
Shifts in microbial community composition and function in the acidification process of a
6
lead/zinc mine tailings
7
8
9
10
Lin-xing Chen†, Jin-tian Li†, Ya-ting Chen, Li-nan Huang, Zheng-shuang Hua, Min Hu,
Wen-sheng Shu
11
12
State Key Laboratory of Biocontrol and Guangdong Key Laboratory of Plant Resources,
13
School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, PR China
14
15
†These
authors contributed equally to this work.
16
17
18
19
20
21
22
23
24
25
1
26
Supplementary Methods
27
DNA extraction, PCR and 454 pyrosequencing
28
The genomic DNA was extracted from each tailings subsample with a modified indirect DNA
29
extraction protocol as described previously (Tan et al., 2008). Briefly, cells were recovered
30
from about 20 g tailings by centrifugation at 900×g at 4ºC for 10 min, using 20 mL sodium
31
pyrophosphate (pH 3.0 or pH 7.0) as dispersal reagent (Duarte et al., 1998), then the
32
supernatant was collected. This recovery step was repeated twice. The collected supernatant
33
was centrifugated at 10,000×g at 4ºC for 15 min to pellet the cells, then the supernatant was
34
removed. The cell pellets obtained were treated with 20 mL of 0.3 M ammonium oxalate (pH
35
3.0 or pH 7.0) for 20 min to dissolve most of the iron precipitate (McKeague and Day, 1966),
36
followed by centrifugation at 10,000×g at 4ºC to pellet the cells, the supernatant was removed
37
and this step was repeated until the supernatant turned colorless. DNA from the cell pellets
38
was extracted with a FastDNA Kit for soil (Qbiogene Inc., Carlsbad, CA) following the
39
manufacturer’s instructions. The universal primer set 515F/806R (Bates et al., 2010) was used
40
to amplify the bacterial and archaeal 16S rRNA genes simultaneously, with an 8-bp barcode
41
specific to tailings subsample on the primer 806R. The primer sequences were as follows: (i)
42
CGTATCGCCTCCCTCGCGCCATCAGCAGTGCCAGCMGCCGCGGTAA, the underlined
43
sequence is the Link Primer Sequence, the ‘CA’ in blue is the two-base protecting sequence
44
on the forward primer sequence, the sequence in green is the primer 515F; (ii)
45
CTATGCGCCTTGCCAGCCCGCTCAGAACGAACGTCGGACTACVSGGGTATCTAAT,
46
the underlined sequence is the Link Primer Sequence, the 8-bp sequence in red is the barcode
47
sequence specific to tailings subsample (see Table S2 for all the barcodes), the ‘TC’ in blue is
48
the two-base protecting sequence on the reverse primer sequence, the sequence in green is the
49
primer 806R. PCR reactions (30 µL) contained 0.75 units Ex Taq DNA polymerase (TaKaRa,
50
Dalian, China), 1× Ex Taq loading buffer (TaKaRa, Dalian, China), 0.2 mM dNTP mix
2
51
(TaKaRa, Dalian, China), 0.2 µM of each primer and about 100 ng template DNA. PCR
52
amplification was conducted according to the procedure as follows: initial denaturation at
53
95ºC for 3 min; 35 cycles of denaturation at 94ºC for 30 s, primer annealing at 50ºC for 1 min,
54
extension at 72ºC for 1 min; a final extension of 10 min at 72ºC. For each tailings subsample,
55
the PCR reaction was conducted in triplicate and the products were pooled to mitigate PCR
56
amplification biases. The composite sample for pyrosequencing was created by combining
57
equimolar ratios of amplification products from individual subsamples as described by Fierer
58
et al. (2008), followed by gel purification using QIAquick Gel Extraction Kit (Qiagen,
59
Chatsworth, CA). The purified composite DNA sample was sent to Macrogen Inc. (Seoul,
60
Korea) for pyrosequencing on a 454 GS FLX Titanium pyrosequencer (Roche 454 Life
61
Sciences, Branford, CT, USA).
62
63
Processing of 454 pyrosequencing data
64
Pyrosequencing data analysis was performed with version 1.26 of the mothur software
65
package (Schloss et al., 2009) as described by Schloss et al. (2011). Given the inflation of
66
biodiversity estimate of sequences from 454 pyrosequencing (Kunin et al., 2010), the
67
sequences were denoised using the commands of ‘shhh.flows’ (translation of PyroNoise
68
algorithm; Quince et al., 2009) and ‘pre.cluster’ (Huse et al., 2010). Additionally, the
69
chimeric sequences were identified and removed using Chimeric Uchime (Edgar et al., 2011).
70
We also removed the sequences with: (i) a sequence length < 280 bp; and/or (ii) eight or more
71
homopolymers; and/or (iii) one or more ambiguous bases. The OTUs were identified at the
72
sequence identity level of 97% using the ‘cluster’ command with the average clustering
73
algorithm (Huse et al., 2010). Subsequently, a representative sequence was selected from each
74
OTU and the taxonomic assignment was achieved using the Ribosomal Database Project
75
(RDP) Classifier (Wang et al., 2007) with a minimum confidence of 80%. The alpha
3
76
microbial biodiversity of the 18 tailings subsamples was estimated by the abundance-based
77
indices of Chao1, Shannon and Simpson. 5,000 quality sequences were randomly sampled
78
(iterations, 10) from each of the 18 tailings subsample, and the average value of each tailings
79
sample was calculated based on the values of corresponding three tailings subsamples.
80
81
Metagenomics sequencing and analysis
82
Library construction and random shotgun sequencing. For T2 and T6 tailings samples,
83
genomic DNA extracted from the three subsamples of each sample were pooled and purified
84
with gel electrophoresis. The purified DNA samples were then sent to BGI Inc. (Shenzhen,
85
China) for shotgun library construction and Illumina sequencing. For both samples, whole
86
genome shotgun sequencing libraries with insert size of 180 bp were generated, then were
87
paired-end sequenced (90 bp × 2) by Illumina’s HiSeq (2000) platform.
88
89
Artifact filtering and quality control. The raw Illumina sequence data (2 GB for each
90
metagenome) were passed several filtering and control steps to obtain clean sequence data as
91
follows: (i) the reads with adapter contamination were identified and removed; (ii) the
92
duplicates were identified and removed; (iii) for the non-duplicate reads, the reads contain
93
more than 18 N were identified and removed; and (iv) the retained reads were trim at the 3’
94
end to remove the bases with a quality score of < 20, and the reads with over 20% of
95
low-quality (quality score < 20) bases were also removed. The obtained clean reads were used
96
for further analysis.
97
98
Whole metagenome assembly. The clean reads were de novo assembled using velvet (version
99
1.1.04) (Zerbino and Birney, 2008), using options ins_length = 180, exp_cov = auto. We tried
100
to assembly both metagenomes using options k from 21 to 55, then the best assembly results
4
101
were selected based on the length of N50 contig and longest contig. As a result, the best
102
k-mer value for T2 metagenome was 45 (N50 contig: 522 bp; longest contig: 60233 bp), and
103
that value for T6 metagenome was 51 (N50 contig: 955 bp; longest contig: 40620 bp).
104
105
Microbial community composition analysis. Two strategies were employed to reveal the
106
microbial composition of T2 and T6 metagenomes: (i) The 16S rRNA genes were identified
107
using BLASTn against the RDP database (release 10) (Cole et al., 2009) from all the contigs
108
(e-value threshold = 10-5), and the taxonomic assignment of the identified 16S rRNA with the
109
anchors ≥ 100 bp was achieved using the RDP Classifier with a minimum confidence of 80%;
110
and (ii) the contigs (≥ 300bp) were compared against the National Center for Biotechnology
111
Information (NCBI) non-redundant (nr) database (e-value threshold = 10-5), then the contigs
112
were classified into taxonomic groups with the lowest ancestor algorithm in MEGAN (Huson
113
et al., 2011) with default parameters (minimum score, 35; minimum support, 1; top percent,
114
10%).
115
116
Gene prediction and functional annotation. The contigs had reliable NCBI-nr hits, as indicated
117
by MEGAN, were extracted for further analysis. The obtained contigs were subject to gene
118
prediction using Genemark with default parameters (Zhu et al., 2010), which yielded 51981
119
and 49538 putative protein-coding genes for T2 and T6 metagenome, respectively (Table S5).
120
We then compared these putative protein-coding genes against the NCBI-nr database, and the
121
ones with NCBI-nr hits were further compared against the Kyoto Encyclopedia of Genes and
122
Genomes (KEGG) database, and the Clusters of Orthologous Groups of proteins (COG)
123
database, using BLASTx (e-value threshold = 10-5).
124
125
Genome binning. Based on the contigs blasting results and MEGAN analysis (minimum score,
5
126
35; minimum support, 1; top percent, 10%), the dominating genus in T2 and T6 metagenomes
127
were binned. As a result, the information of the largest bins is shown in Table S6.
128
129
Contigs coverage estimate. For the coverage estimate of contigs, we firstly aligned the clean
130
reads used for assembly to the contigs using SOAPAligner (Li et al., 2009), three steps were
131
then conducted: (i) the index were built using all the contigs from assembly results
132
(2bwt-builder); (ii) align clean reads against the contigs based index (soap); and (iii) the
133
SOAP.COVERAGE (Li et al., 2009) was used to parse the output file of SOAPAligner. The
134
coverage estimate of contigs is shown in Fig. S7.
135
136
The functional abundance profile analysis of COG catalogues and COG categories
137
Based on the COG blast results, the predicted genes with reliable COG blast hits were
138
assigned to COG catalogues and COG categories (if available). To determine whether a
139
specific COG catalogue or COG category was enriched in our metagenomes, the odds ratio
140
for a specific COG catalogue or COG category against that in all sequenced bacteria and
141
archaea was calculated as follows.
142
COG catalogue (or COG category) odds_ratio 
A/B
C/D
143
144
145
146
147
148
149
Where:
A = No. of genes assigned to a specific COG catalogue (or COG category) in
metagenome T2 (or T6)
B = No. of genes assigned to all other COG catalogues (or COG categories) in
metagenome T2 (or T6)
C = No. of genes assigned to a specific COG catalogue (or COG category) in all
6
150
sequenced bacteria and archaea
D = No. of genes assigned to all other COG catalogues (or COG categories) in all
151
152
sequenced bacteria and archaea
153
154
The values for ‘C’ and ‘D’ were obtained from the Integrated Microbial Genomes (IMG)
155
system (http://img.jgi.doe.gov/cgi-bin/w/main.cgi; Markowitz et al., 2012). The P-value was
156
calculated for each odds ratio using one-tailed Fisher’s exact test within the R statistical
157
computing environment (version 2.9.2) to identify significant deviations from equilibrium
158
(odds ratio = 1), according to Hemme et al. (2010). The values of odds ratio for COG
159
categories were translated through ln (odds ratio) and plotted, to gain a visualized positive or
160
negative trend (Fig. S6). Detailed information of selected COGs is provided in Table S7.
161
162
Functional abundance profile analysis of KEGG enzymes
163
Based on the KEGG blast results, the putative protein-coding genes with KEGG hits were
164
assigned to enzymes and KEGG pathways (if available), and the metabolic pathways were
165
constructed for T2 and T6 metagenomes. To get a better characterization of the metabolic
166
capabilities of the two metagenomes, the odds ratio of each enzyme was calculated as
167
follows:
Enzyme
168
169
170
171
odds_ratio

A/B
C/D
Where:
A = No. of genes corresponding to a specific enzyme of a KEGG pathway in
metagenome T2 (or T6)
172
B = No. of genes of all other enzymes of KEGG pathways in metagenome T2 (or T6)
173
C = No. of genes corresponding to a specific enzyme of a KEGG pathway in all
174
sequenced bacteria and archaea
7
D = No. of genes of all other enzymes of KEGG pathways in all sequenced bacteria
175
176
177
and archaea
The
values
for
‘C’
and
‘D’
were
obtained
from
the
IMG
system
178
(http://img.jgi.doe.gov/cgi-bin/w/main.cgi) (Markowitz et al., 2012). The P-value was
179
calculated for each odds ratio using one-tailed Fisher’s exact test within the R statistical
180
computing environment (version 2.9.2) to identify significant deviations from equilibrium
181
(odds ratio = 1).
182
183
Supplementary References
184
Bates, S., Berg-Lyons, D., Caporaso, J., Walters, W., Knight, R., and Fierer, N. (2010)
185
Examining the global distribution of dominant archaeal populations in soil. ISME J 5:
186
908–917.
187
Cole, J.R., Wang, Q., Cardenas, E., Fish, J., Chai, B., Farris, R.J., et al. (2009) The Ribosomal
188
Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids
189
Res 37: D141–D145.
190
Duarte, G.F., Rosado, A.S., Seldin, L., KeijzerWolters, A.C., and van Elsas, J.D. (1998)
191
Extraction of ribosomal RNA and genomic DNA from soil for studying the diversity of
192
the indigenous bacterial community. J Microb Methods 32: 21–29.
193
Edgar, R.C., Haas, B.J., Clemente, J.C., Quince, C., and Knight, R. (2011) UCHIME
194
improves sensitivity and speed of chimera detection. Bioinformatics 27: 2194–2200.
195
Fierer, N., Hamady, M., Lauber, C.L., and Knight, R. (2008) The influence of sex, handedness,
196
and washing on the diversity of hand surface bacteria. Proc Natl Acad Sci USA 105:
197
17994–17999.
198
Hemme, C.L., Deng, Y., Gentry, T.J., Fields, M.W., Wu, L., Barua, S., et al. (2010)
199
Metagenomic insights into evolution of a heavy metal-contaminated groundwater
8
200
microbial community. ISME J 4: 660–672.
201
Huse, S.M., Welch, D.M., Morrison, H.G., and Sogin, M.L. (2010) Ironing out the wrinkles in
202
the rare biosphere through improved OTU clustering. Environ Microbiol 12: 1889–1898.
203
Huson, D.H., Mitra, S., Ruscheweyh, H.J., Weber, N., and Schuster, S.C. (2011) Integrative
204
analysis of environmental sequences using MEGAN4. Genome Res 21: 1552–1560.
205
Kunin, V., Engelbrektson, A., Ochman, H., and Hugenholtz, P. (2010) Wrinkles in the rare
206
biosphere: pyrosequencing errors lead to artificial inflation of diversity estimates.
207
Environ Microbiol 12: 118–123.
208
Li, R., Yu, C., Li, Y., Lam, T.W., Yiu, S.M., Kristiansen, K., and Wang, J. (2009) SOAP2:
209
An improved ultrafast tool for short read alignment. Bioinformatics 25: 1966–1967.
210
Markowitz, V.M., Chen, I.M., Palaniappan, K., Chu, K., Szeto, E., Grechkin, Y., et al. (2012)
211
IMG: the Integrated Microbial Genomes database and comparative analysis system.
212
Nucleic Acids Res 40: D115–D122.
213
214
McKeague, J.A., and Day, J.H. (1966) Dithionite- and oxalate-extractable Fe and Al as aids in
differentiating various classes of soils. Can J Soil Sci 46: 13–22.
215
Quince, C., Lanzen, A., Curtis, T.P., Davenport, R.J., Hall, N., Head, I.M., et al. (2009) Noise
216
and the accurate determination of microbial diversity from 454 pyrosequencing data. Nat
217
Methods 6: 639–641.
218
Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister, E.B., et al. (2009)
219
Introducing mothur: open-source, platform-independent, community-supported software
220
for describing and comparing microbial communities. Appl Environ Microbiol 75:
221
7537–7541.
222
Tan, G.L., Shu, W.S., Hallberg, K.B., Li, F., Lan, C.Y., Zhou, W.H., et al. (2008) Culturable
223
and molecular phylogenetic diversity of microorganisms in an open-dumped, extremely
224
acidic Pb/Zn mine tailings. Extremophiles 12: 657–664.
9
225
Wang, Q., Garrity, G., Tiedje, J., and Cole, J. (2007) Naïve Bayesian classifier for rapid
226
assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol
227
73: 5261–5267.
228
229
230
231
Zerbino, D.R., and Birney, E. (2008) Velvet: algorithms for de novo short read assembly using
de Bruijn graphs. Genome Res 18: 821–829.
Zhu, W., Lomsadze, A., and Borodovsky, M. (2010) Ab initio gene identification in
metagenomic sequences. Nucleic Acids Res 38: e132.
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
10
250
Supplementary Figures
251
252
253
Fig. S1. A map showing the tailings surface with the six sampling sites (T1-T6, see main text
254
for more details).
11
255
256
Fig. S2. The relative content of inorganic sulfur compounds (A) and ferric iron (B) in the six
257
tailings samples. The results for inorganic sulfur compounds were presented based on sulfur
258
as revealed by XPS. The results for ferric iron were presented as the quotient of ferric iron
259
concentration to total iron concentration in each of the samples. The bars showed the standard
260
errors of the relative abundance of three subsamples for each tailings sample. Different
261
lower-case letters above the bars indicated that the values were significantly different (P <
262
0.05, LSD).
12
263
264
Fig. S3. Rarefaction curves showing the microbial biodiversity of the six tailings samples.
265
OTUs (operational taxonomic units) were defined at the sequence identity level of 97%. For
266
each tailings subsample, 5000 quality sequences were randomly selected to calculate the
267
number of the OTUs (iterations, 10). The average values of the three tailings subsamples were
268
then calculated to represent the value of the corresponding tailings sample.
269
270
271
272
273
274
275
276
13
277
278
Fig. S4. Multivariate regression tree (MRT) showing the primary physicochemical
279
characteristics affecting the microbial community composition of the six tailings samples. The
280
physicochemical characteristics used for analysis included moisture content, pH, EC, TOC,
281
TN, T-Fe and T-S. Mois, moisture content; EC, electrical conductivity; TOC, total organic
282
carbon; TN, total nitrogen; T-Fe, total iron; T-S, total sulfur.
283
284
285
286
14
287
288
Fig. S5. The microbial community composition at the phylum level as revealed by MEGAN
289
(A) and 16S rRNA gene analysis (B). The 16S rRNA gene fragments from the metagenomes
290
were identified using BLASTN against the RDP database (e-value threshold = 10-5). The
291
taxonomic assignment of the identified 16S rRNA anchors ≥ 100 bp was achieved using the
292
RDP Classifier with a minimum confidence of 80%.
293
294
295
296
297
15
298
299
300
Fig. S6. The odds ratio of specific COG categories of metagenome T2 (A) and T6 (B)
301
compared to that of all sequenced bacteria and archaea. The values of odds ratio for COG
302
categories were translated through ln (odds ratio) and plotted, to gain a visualized positive and
303
negative trend. Asterisks indicate significant deviation from the null hypothesis (odds ratio =
304
1) at the 95% confidence level by one-tailed Fisher exact test.
305
306
307
308
309
310
311
312
313
314
16
315
316
317
Fig. S7. The distribution of coverage for contigs of T2 (A) and T6 (B) metagenomes. The
318
quality sequencing reads were firstly mapped to the contigs, and then the average depth of
319
each contig was calculated.
320
17
Supplementary Tables
Table S1. Concentrations (mg kg-1) of the heavy metals in the six tailings samples.
Tailings
Zn
Pb
Mn
Cr
Cd
Hg
As
Cu
T1
52906 ± 4480b
12830 ± 313b
1376 ± 32b
75 ± 2a
13 ± 0.3b
10 ± 1b
1197 ± 30a
389 ± 25a
T2
T3
T4
T5
T6
122418 ± 29030a
11429 ± 2904c
9235 ± 2309c
13035 ± 706c
9461 ± 1293c
6811 ± 898c
16323 ± 2102a
10936 ± 811b
5858 ± 582c
6813 ± 652c
1896 ± 212a
143 ± 43d
112 ± 13d
181 ± 5d
586 ± 70c
38 ± 2bc
36 ± 1c
35 ± 1c
22 ± 1d
40 ± 1b
28 ± 5a
3.4 ± 1.1c
2.4 ± 0.5c
3.0 ± 0.2c
7.9 ± 2.0bc
16 ± 4ab
19 ± 5a
10 ± 1ab
7.4 ± 0.4b
11 ± 1ab
1182 ± 80a
459 ± 127b
238 ± 4bc
210 ± 6c
1116 ± 85a
109 ± 19b
20 ± 4c
36 ± 8c
ND
106 ± 3b
Mean ± SE are shown. ND, not detected. In each column, values followed by different lower-case letters were significantly different (P < 0.05,
LSD).
18
Table S2. No. of quality sequences in the six tailings samples a.
Subsamples
Barcode sequences
No. of quality sequences
T1-1
T1-2
T1-3
AACGAACG
AACGAAGC
AACGATCC
11164
8139
7634
T2-1
T2-2
T2-3
AAGCGCAA
AAGCGCTT
AAGCGGAT
6974
6517
5486
T3-1
T3-2
AAGCATCC
AAGCATGG
8322
9788
T3-3
AACGGCTT
6942
T4-1
T4-2
T4-3
AACGCGAA
AACGCGTT
AACGGCAA
6479
6214
6634
T5-1
T5-2
T5-3
AACGTAGG
AACGTTCG
AACGTTGC
8262
8876
8817
T6-1
T6-2
T6-3
AAGGATGC
AAGGCCAA
AAGGCCTT
6495
6379
7033
a. The quality sequences met the criterions as follows: the minimal length, 280 bp; the
maximal homopolymer, 8; and without any ambiguous base and no chimeric sequences.
19
Table S3. Microbial biodiversity of the six tailings samples revealed by pyrosequencing.
Tailings samples
T1
T2
T3
T4
T5
T6
OTUs
227
238
481
435
499
101
Chao1
547
433
610
775
805
195
Simpson
0.76
0.78
0.97
0.77
0.80
0.84
Shannon
3.3
4.0
6.5
4.5
4.6
3.4
OTUs were defined at 97% sequence identity level. For each tailings sample, 5000 sequences
were randomly sampled from each of the three tailings subsample (iterations, 10), the average
values of the three tailings subsamples were then calculated to represent the value of the
corresponding tailings sample.
20
Table S4. The relative abundance (%) of dominating sequences pertaining to genus in the six
tailings samples.
Genus
T1
T2
T3
T4
T5
T6
Acidithiobacillus
Acinetobacter
Amycolatopsis
Brucella
Comamonas
Corynebacterium
Enhydrobacter
0.30
0.22
0.04
0.22
5.1
0.03
0.10
0.02
0.12
0.07
0.12
0.08
0.00
0.04
1.0
7.1
1.8
6.5
0.87
5.4
1.9
0.47
4.8
0.25
1.5
0.34
0.23
0.37
0.25
3.0
0.51
1.8
0.44
1.6
0.62
19
0.01
0.01
0.01
0.01
0.00
0.01
Ferroplasma
Gemmatimonas
Hydrogenophaga
Legionella
Leptospirillum
Methylobacterium
Peredibacter
Pseudomonas
Rubrobacter
0.57
0.03
32
0.25
0.20
0.11
0.17
0.15
0.27
0.04
2.0
0.00
2.2
0.05
0.17
1.0
0.00
1.5
2.9
0.03
0.08
0.12
0.64
3.9
0.07
5.1
0.07
45
0.18
0.03
0.74
0.54
1.1
0.54
0.95
0.48
57
0.01
0.08
0.11
0.16
1.1
0.01
1.7
0.05
28
0.00
0.00
0.00
14
0.00
0.01
0.00
0.00
Sphingomonas
Staphylococcus
Streptococcus
Sulfobacillus
Thermogymnomonas
Thiobacillus
Thiovirga
0.22
0.08
0.04
0.12
0.01
12
26
1.9
0.00
0.00
0.03
0.00
3.3
0.02
1.6
11
1.1
0.27
0.06
0.08
0.16
1.0
0.34
0.21
0.07
0.01
1.5
0.07
0.65
2.8
0.45
0.09
0.02
0.07
0.06
0.00
0.00
0.00
13
4.3
0.00
0.00
If the genus related sequences with relative abundance > 1% in at least one tailings sample,
then the genu was defined as dominating genus. The relative abundance of genus related
sequences was calculated as the average value of three subsamples of each tailings sample.
21
Table S5. Summarized information of assembly, genes prediction and annotation of T2 and
T6 metagenomes.
Item
T2
T6
Value
Percentage
Value
Percentage
51071
100%
37765
100%
Mean length (bp)
549
–
795
–
Mean GC%
52
–
42
–
N50 (bp)
522
–
955
–
Longest (bp)
60233
–
40620
–
No. with NCBI-nr a
38551
76%
30463
81%
51981
100%
49538
100%
Mean length (bp)
386
–
451
–
Mean GC%
52.5
–
43.2
–
No. with NCBI-nr hits c
44853
86%
43587
88%
No. with KEGG hits
42475
95%
36824
85%
No. connected to KEGG Orthology (KO)
23336
52%
20966
48%
No. connected to KEGG pathways
14166
32%
13034
30%
No. with COG hits
32522
73%
32128
74%
No. with COGs
31013
69%
29789
68%
Contigs
No. of total
Putative protein coding genes b
No. of total
a. All the blasting comparison in this study was with the same criterion: e-value threshold =
10-5.
b. The genes were predicted from the contigs with NCBI-nr hits, using MetaGene with default
parameters.
c. Only the putative protein-coding genes with NCBI-nr hits were further compared against
the KEGG and COG databases, so that the subsequent calculation of percentage associated
with KEGG and COG was based on the number of NCBI-nr hits.
22
Table S6. Binning information of contigs based on MEGAN results for T2 and T6 metagenomes.
Contigs
Metagenomes
T2
T6
Domain
Phylum
Class
Order
Family
Base pairs
Genus
No.
Total %
No. (bp)
Total %
Bacteria
Proteobacteria
Betaproteobacteria
Hydrogenophilales
Hydrogenophilaceae
Thiobacillus
1435
3.7
827660
3.7
Bacteria
Proteobacteria
Betaproteobacteria
Burkholderiales
Burkholderiaceae
Limnobacter
991
2.6
410968
1.9
Bacteria
Actinobacteria
Actinobacteria
Rubrobacteridae
Rubrobacteridae
Rubrobacter
985
2.6
478579
2.2
Archaea
Euryarchaeota
Thermoplasmata
Thermoplasmatales
Ferroplasmaceae
Ferroplasma
9082
30
7579534
30
Bacteria
Nitrospirae
Nitrospira
Nitrospirales
Nitrospiraceae
Leptospirillum
5740
19
4413841
17
Bacteria
Proteobacteria
Gammaproteobacteria
Acidithiobacillales
Acidithiobacillaceae
Acidithiobacillus
4318
14
2947378
12
The information of the largest three bins in T2 and T6 metagenomes are shown. This was obtained from the MEGAN analysis based on the blasting results of contigs
against the NCBI-nr database (e-value threshold = 10-5).
23
Table S7. Summary of the specific COGs associated with heavy metals stress and low pH stress in T2 and T6.
Stress
COG ID
COG category
Gene
COG information
Heavy metals
COG0598
COG2217
COG0672
COG0798
COG3696
COG0841
COG0845
COG1538
COG1230
COG0861
COG1275
COG2059
COG0474
COG2239
[P]*
[P]
[P]
[P]
[P]
[V]
[M]
[MU]
[P]
[P]
[P]
[P]
[P]
[P]
corA
cadA
FTR1
ACR3
czcA
czcA
czcB
czcC
czcD
terC
tehA
chrA
mgtA
mgtE
Mg2+ and Co2+ transporters
Cation transport ATPase
High-affinity Fe2+/Pb2+ permease
Arsenite efflux pump ACR3 and related permeases
Putative silver efflux pump
Cation/multidrug efflux pump
Membrane-fusion protein
Outer membrane protein
Co/Zn/Cd efflux system component
Membrane protein TerC, possibly involved in tellurium resistance
Tellurite resistance protein and related permeases
Chromate transport protein ChrA
Cation transport ATPase
Mg/Co/Ni transporter MgtE (contains CBS domain)
COG2216
COG2060
COG2156
COG1657
[P]
[P]
[P]
[I]
KdpB
KdpA
KdpC
-
High-affinity K+ transport system, ATPase chain B
K+-transporting ATPase, A chain
K+-transporting ATPase, c chain
Squalene cyclase
Low pH
COG hits
T2
T6
21
7
132
40
9
4
11
0
194
50
141
112
107
45
46
32
27
9
41
0
0
20
6
0
25
45
15
0
2
3
2
0
32
29
13
11
*Information on COG categories: [P] Inorganic ion transport and metabolism; [V] Defense mechanisms; [M] Cell wall/membrane/envelope biogenesis; [U]
Intracellular trafficking, secretion, and vesicular transport; [I] Lipid transport and metabolism.
24
Download