Supporting Information S1: Specific

advertisement
SUPPLEMENTARY MATERIAL for
Novel
Microbial
Populations
in
Ambient
and
Mesophilic
Biogas-producing and Phenol-degrading Consortia Unraveled by
High-throughput Sequencing
Feng Ju, Tong Zhang*
Environmental Biotechnology Laboratory, Department of Civil Engineering, The
University of Hong Kong, Hong Kong SAR
Submitted to Microbial Ecology
*Corresponding author phone: +852-28578551; fax: +852-25595337; e-mail:
zhangt@hku.hk
Supporting Information S1: Specific methanogenic activity tests
Supporting Information S2: Bioinformatics analysis
Tables
Table S1 454 pyrosequencing datasets of the seed sludge, AT and MT enrichments
Table S2 Illumina sequencing data of the seed sludge, AT and MT enrichments
Table S3 Total bacterial classes shared among the seed sludge, AT and MT
enrichments.
Table S4 Total bacterial genera shared among the seed sludge, AT and MT
enrichments
Figures
Figure S1 The flowchart of data processing in this study.
Figure S2 Variation of phenol, VFAs and alcohols concentrations with time in the AT
(a) and MT (b) reactors during Batch 18
Figure S3 Phenol-degrading profiles at different initial concentrations in SMA tests
Figure S4 Rarefaction curves of the seed sludge, ambient and mesophilic enrichments
at similarity cutoffs of 3% (a) and 6% (b)
Figure S5 Shift in Phylum Proteobacteria before and after enrichment under ambient
and mesophilic conditions
Figure S6 Phylogenic trees of 16S rRNA gene sequences constructed for the most
abundant species detected in the AT (a) and MT (b) phenol-degrading
enrichments
Figure S7 Rank-abundance curves for bacterial genus in the seed sludge, AT and MT
reactors, respectively
Supporting Information S1: Specific methanogenic activity tests
The SMA tests of the phenol-degrading sludge were conducted in 166 ml batch serum
bottles (working volume 50 ml) at 20 and 37 0C, respectively. The sludge for the batch
tests was sampled from AT and MT reactors on Day 193 when the AT and MT sludge
could tolerate phenol concentrations as high as 875 and 1000 mg.L-1 (Figure S2),
corresponding to phenol loadings of 365 and 417 mg.L-1.d-1, respectively, removing
almost 100% of phenol. Initial phenol concentrations in SMA tests varied from 100 to
1000 mg.L-1, and phenol depletion was monitored until the concentration of phenol
was below the detection limit. The sludge concentrations were determined at the end
of the test, and the volatile suspended solids concentrations in each batch was
determined to be 0.55 g/L for AT sludge and 0.73 g/L for MT sludge, respectively.
Supporting Information S2: Bioinformatics analysis
(1) Processing of high-throughput sequencing datasets
The analysis of sequencing data in this study was performed using the procedures
shown in Figure S1. The analysis of 454 pyrosequencing datasets was conducted in
QIIME (quantitative insights into microbial ecology, v 1.5.0) pipeline [1]. First, the
454 reads (sequences) were separated into different samples based on their nucleotide
barcodes. Then, sequences in each sample were denoised by AmpliconNoise using the
default parameters except that Perseus algorithm for chimera removal was disabled.
After that, chimera checking was performed using ChimeraSlayer [2]. Those reads
after denoise and chimera removal are referred as “effective reads”. Although
bacteria-specific primers were used, very small amount of undesired archaeal reads
were still obtained. To exclude those archaeal reads, the effective reads of each AS
sample were submitted to the online RDP Classifier [3] to identify the archaeal and
bacterial reads, and archaeal reads were discarded. To fairly compare all samples at
the same sequencing depth, normalization of the bacterial sequence number was
conducted by randomly extracting 8150 sequences from each 454 dataset. For the
metagenomic dataset, reads containing one or more uncalled bases, or containing
bases with quality score < 30 were removed. Then, the reads were de-replicated to get
rid of the duplicate and near-duplicate reads deprived from Illumina sequencing [4],
based on the guideline of MG-RAST [5]. After that, all PE reads were merged
allowing a minimum overlap region of 10 bps. The sequences obtained after reads
overlapping were referred as “tags” and used for downstream analysis.
(2) Taxonomic analysis
The bacterial composition was analyzed based on the bacterial 16S rRNA sequences
obtained from 454 pyrosequencing. The bacterial sequences were searched against
GreenGenes database using NCBI’s BLASTN tool at an e-value cutoff of 1e-20 to
identify the 16S rRNA gene fragments. All bacterial sequences with hits of e-value <
10-20 were used for taxonomic analysis. The top 100 hits of all qualified bacterial
sequences in each sample were imported into MEGAN and then annotated by Lowest
Common Ancestor (LCA) algorithm using the default parameters except that the
Percent Identity Filter was activated, which was specifically designed to filter 16S
rRNA gene sequences by similarity based on the following principal: the percent
identity of a match must exceed the given value of percent identity to be assigned at
the given rank: Species 99%, Genus 97%, Family 95%, Order 90%, Class 85%, and
Phylum 80%. The archaeal populations were analyzed using the 16S rRNA gene tags
identified from the metagenomic tags by performing BLASTN against two 16S rRNA
gene databases, GreenGenes and SILVA SSU databases, respectively, at an e-value
cutoff of 10-20. To minimize short random similarities, 16S rRNA gene tags with read
length between 150~190 bp and an alignment length of >100 bp were used for
taxonomic analysis using MEGAN-LCA strategy following the above procedures [6].
(3) Construction of rarefaction curves and phylogenic trees
The normalized 8150 bacterial sequences for the seed sludge, AT and MT
enrichments were individually submitted to RDP Pyrosequencing Pipeline [7], in
which they were first aligned by Infernal according to the bacteria-alignment model in
Align module of the RDP [8], and then were assigned to phylotype clusters at the
dissimilarity cutoffs of 3% and 6% by using Complete Linkage Clustering, and finally
the rarefaction curves were computed based on the clusters. The phylogenic trees
were constructed for the most abundant species detected in the AT (a) and MT (b)
phenol-degrading enrichments, and the reference sequences retrieved from GeneBank
and SilvaSSU111, using the MEGA5 package [9]. In brief, the relevant sequences
were extracted from the 454 datasets, and aligned using ClustalW program provided,
and the tree was constructed using the neighbor-joining algorithm with Jukes-Cantor
model (bootstrapping number =1000).
References
1. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK,
Fierer N, Peña AG, Goodrich JK, Gordon JI (2010) QIIME allows analysis of
high-throughput community sequencing data. Nat Methods 7: 335-336.
2. Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Giannoukos G, Ciulla D,
Tabbaa D, Highlander SK, Sodergren E (2011) Chimeric 16S rRNA sequence
formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome
Res 21: 494-504.
3. Wang Q, Garrity GM, Tiedje JM, Cole JR (2007) Naive Bayesian classifier for
rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ
Microbiol 73: 5261-5267.
4. Burriesci MS, Lehnert EM, Pringle JR (2012) Fulcrum: condensing redundant
reads from high-throughput sequencing studies. Bioinformatics 28: 1324-1327.
5. Meyer F, Paarmann D, D'Souza M, Olson R, Glass EM, Kubal M, Paczian T,
Rodriguez A, Stevens R, Wilke A (2008) The metagenomics RAST server–a public
resource for the automatic phylogenetic and functional analysis of metagenomes.
BMC Bioinformatics 9: 386.
6. Ghai R, RodÅ•íguez-Valera F, McMahon KD, Toyama D, Rinke R, de Oliveira
TCS, Garcia JW, de Miranda FP, Henrique-Silva F (2011) Metagenomics of the water
column in the pristine upper course of the Amazon river. PLoS One 6: e23785.
7. Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, Kulam-Syed-Mohideen
A, McGarrell D, Marsh T, Garrity GM (2009) The Ribosomal Database Project:
improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37:
D141-D145.
8. Nawrocki EP, Eddy SR (2007) Query-dependent banding (QDB) for faster RNA
similarity searches. PLoS Comput Biol 3: e56.
9. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S (2011) MEGA5:
molecular evolutionary genetics analysis using maximum likelihood, evolutionary
distance, and maximum parsimony methods. Mol Biol Evol 28: 2731-2739.
Table S1 454 pyrosequencing datasets of the seed sludge, AT and MT enrichments
Raw reads
Denoised reads
Effective reads
1
SEED
AT
MT
19170
24331
18419
14250
11364
8305
14238
11345
8289
Bacterial reads
14197
11292
8165
Archaeal reads
41
53
15
Normalized reads
8150
8150
8150
2
OTUs at 3%
381
150
106
OTUs at 6%
315
125
91
1
Effective reads refers to denoised reads after removing chimeric sequences.
2
The diversity measurement (OTUs) was performed at a rarefaction of 8150 sequences using
RDP Pyrosequencing Pipeline Rarefaction Tool.
Table S2 Illumina metagenomic sequencing datasets of the seed sludge, AT and MT enrichments
SEED1*
Category
Reads
Tags1
SEED2*
AT
MT1*
MT2*
17,655,000x2
14,444,685x2
17,560,000x2
14,394,059x2
Raw (100 bps)
21,475,060x2
19,282,148x2
After de-replication
20,313,343x2
18,335,191
17,665,000x2
15,285,922x2
Assembled gene tags (~190bp)
13,192,974
15,060,113
13,113,118
13,119,590
13,106,796
Normalized gene tags (~190bp)
13,106,796
13,106,796
13,106,796
13,106,796
13,106,796
9469
9913
9,614
8,699
8,516
16S rRNA gene
tags2
(150~190bp)
1: Tags means the sequences obtained from overlapping the PE metagenomic reads.
2: 16S rRNA gene tags were identified by BLASTN against GreenGenes at the e-value cutoff of 1e-20.
*SEED1 & SEED2 and MT1 & MT2 were the replicate metagenomic datasets derived from DNA replicates extracted from the seed sludge
and MT enrichment, respectively.
Table S3 Total bacterial classes shared among the seed sludge, AT and MT enrichments. The
results are based on taxonomic analysis of the 454 datasets. The number inside the
bracket of the first column indicates the number of classes shared.
Shared by1
Class
Reads number
Percent (%) in total bacterial reads
SEED
AT
MT
SEED
AT
MT
Bacteroidia
331
1341
120
4.06
16.45
1.47
Deltaproteobacteria
183
1262
1652
2.25
15.48
20.27
Clostridia
1276
730
681
15.66
8.96
8.36
Synergistia
172
391
415
2.11
4.80
5.09
AT&MT&
Actinobacteria
452
369
78
5.55
4.53
0.96
SEED (10)
Alphaproteobacteria
835
203
29
10.25
2.49
0.36
Gammaproteobacteria
308
157
47
3.78
1.93
0.58
Anaerolineae
1464
54
305
17.96
0.66
3.74
Betaproteobacteria
256
52
1051
3.14
0.64
12.90
Spirochaetia
189
32
27
2.32
0.39
0.33
Epsilonproteobacteria
0
2640
795
0.00
32.39
9.75
WWE1
0
121
2040
0.00
1.48
25.03
Thermomicrobia
0
67
17
0.00
0.82
0.21
Erysipelotrichi
0
7
56
0.00
0.09
0.69
Sphingobacteriia
103
20
0
1.26
0.25
0.00
Negativicutes
37
8
0
0.45
0.10
0.00
Bacilli
115
0
12
1.41
0.00
0.15
Chlorobia
0
363
0
0.00
4.45
0.00
Solibacteres
0
9
0
0.00
0.11
0.00
Elusimicrobia
0
6
0
0.00
0.07
0.00
Thermotogae
0
0
13
0.00
0.00
0.16
Caldilineae
239
0
0
2.93
0.00
0.00
Opitutae
93
0
0
1.14
0.00
0.00
Planctomycetia
89
0
0
1.09
0.00
0.00
Aquificae
53
0
0
0.65
0.00
0.00
Verrucomicrobiae
47
0
0
0.58
0.00
0.00
Cytophagia
41
0
0
0.50
0.00
0.00
Deinococci
36
0
0
0.44
0.00
0.00
Acidobacteriia
22
0
0
0.27
0.00
0.00
Chlamydiia
8
0
0
0.10
0.00
0.00
Nitrospira
5
0
0
0.06
0.00
0.00
AT&MT
(4)
AT&SEED
(2)
MT&SEED
(1)
AT (3)
MT (1)
SEED (23)
1.
The number inside the bracket indicates the number of classes shared.
Table S4 Total bacterial genera shared among the seed sludge, AT and MT enrichments. The
results are based on taxonomic analysis of the 454 datasets. The number inside the
bracket of the first column indicates the number of genera shared.
Percent (%) in total bacterial
Reads number
Shared by
Genus
1
SEE
D
reads
AT
MT
SEED
AT
MT
Syntrophorhabdus
7
669
1433
0.09
8.21
17.58
Synergistes
23
122
29
0.28
1.50
0.36
AT&MT
Mycobacterium
139
119
23
1.71
1.46
0.28
&SEED
Aminobacterium
5
111
133
0.06
1.36
1.63
Acinetobacter
35
58
38
0.43
0.71
0.47
T78
834
24
158
10.23
0.29
1.94
Brevundimonas
10
12
11
0.12
0.15
0.13
0
2608
795
0.00
32.00
9.75
Pelotomaculum
0
514
89
0.00
6.31
1.09
Desulfovibrio
0
329
90
0.00
4.04
1.10
Syntrophus
0
162
99
0.00
1.99
1.21
W22
0
121
2040
0.00
1.48
25.03
Desulfomicrobium
0
20
8
0.00
0.25
0.10
Bellilinea
0
11
30
0.00
0.13
0.37
Rhodoplanes
22
9
0
0.27
0.11
0.00
D
Levilinea
122
8
0
1.50
0.10
0.00
(3)
Iamia
14
12
0
0.17
0.15
0.00
MT&SEE
Thermovirga
20
0
232
0.25
0.00
2.85
D
Clostridium
15
0
11
0.18
0.00
0.13
(3)
Syntrophobacter
50
0
9
0.61
0.00
0.11
Rhodopseudomonas
0
143
0
0.00
1.75
0.00
Chlorobaculum
0
58
0
0.00
0.71
0.00
Rhodococcus
0
43
0
0.00
0.53
0.00
Geobacter
0
41
0
0.00
0.50
0.00
AT
Sulfuricurvum
0
23
0
0.00
0.28
0.00
(10)
Treponema
0
19
0
0.00
0.23
0.00
Alishewanella
0
11
0
0.00
0.13
0.00
Candidatus Solibacter
0
9
0
0.00
0.11
0.00
Arcobacter
0
9
0
0.00
0.11
0.00
Thauera
0
5
0
0.00
0.06
0.00
Brachymonas
0
0
584
0.00
0.00
7.17
MT
Moorella
0
0
489
0.00
0.00
6.00
(11)
Thermonema
0
0
95
0.00
0.00
1.17
Turicibacter
0
0
56
0.00
0.00
0.69
(7)
Campylobacterales-related
genus
AT&MT
(7)
AT&SEE
SEED
(29)
Alcaligenes
0
0
23
0.00
0.00
0.28
Corynebacterium
0
0
11
0.00
0.00
0.13
Rhodobacter
0
0
10
0.00
0.00
0.12
Fervidobacterium
0
0
10
0.00
0.00
0.12
Bacillus
0
0
9
0.00
0.00
0.11
Arthrobacter
0
0
7
0.00
0.00
0.09
Pseudomonas
0
0
7
0.00
0.00
0.09
Sedimentibacter
524
0
0
6.43
0.00
0.00
Caldilinea
239
0
0
2.93
0.00
0.00
Spirochaeta
160
0
0
1.96
0.00
0.00
Lysobacter
82
0
0
1.01
0.00
0.00
Syntrophomonas
78
0
0
0.96
0.00
0.00
Streptococcus
61
0
0
0.75
0.00
0.00
Bradyrhizobium
56
0
0
0.69
0.00
0.00
Novosphingobium
51
0
0
0.63
0.00
0.00
Parabacteroides
50
0
0
0.61
0.00
0.00
Butyrivibrio
47
0
0
0.58
0.00
0.00
Planctomyces
47
0
0
0.58
0.00
0.00
Adhaeribacter
41
0
0
0.50
0.00
0.00
Weissella
37
0
0
0.45
0.00
0.00
Deinococcus
36
0
0
0.44
0.00
0.00
Sphingomonas
35
0
0
0.43
0.00
0.00
Zoogloea
28
0
0
0.34
0.00
0.00
Aminiphilus
23
0
0
0.28
0.00
0.00
Candidatus Microthrix
20
0
0
0.25
0.00
0.00
Pedomicrobium
16
0
0
0.20
0.00
0.00
Gemmata
15
0
0
0.18
0.00
0.00
Longilinea
14
0
0
0.17
0.00
0.00
Flavisolibacter
13
0
0
0.16
0.00
0.00
Dehalobacterium
11
0
0
0.13
0.00
0.00
Prosthecobacter
10
0
0
0.12
0.00
0.00
Propionibacterium
8
0
0
0.10
0.00
0.00
Selenomonas
8
0
0
0.10
0.00
0.00
Thermomonas
6
0
0
0.07
0.00
0.00
Nitrospira
5
0
0
0.06
0.00
0.00
Thermosinus
5
0
0
0.06
0.00
0.00
1. The number inside the bracket indicates the number of genera shared.
Figure S1 The flowchart of data processing in this study. The green and reddish-brown
squares show the analysis procedures for 454 datasets and metagenomic datasets,
respectively. The purple square show the method used for handling both 454 and
metagenomic datasets.
a
b
500
900
800
400
Concentration (mg/L)
Concentration (mg/L)
700
300
200
100
600
500
400
300
200
100
0
0
1
2
3
Time (days)
Phenol
4
5
Benzoate
0
0
Ethanol
1
Butanol
2
3
Time (days)
4
5
Acetatic acid
Figure S2 Variation of phenol, VFAs and alcohols concentrations with time in the AT
(a) and MT (b) reactors during Batch 18
a
b
1200
1100
1100
1000
1000
Phenol concentration (mg/L)
Phenol concentration (mg/L)
1200
900
800
700
600
500
400
300
900
800
700
600
500
400
300
200
200
100
100
0
0
0
2
4
6
8
10
12
14
16
18
20
0
2
Time(days)
100mg/L
4
6
8
10 12 14 16 18 20
Time(days)
200mg/L
400mg/L
600mg/L
1000mg/L
Figure S3 Phenol-degrading profiles at different initial concentrations in SMA tests using sludge
collected from AT (c) and MT (d) reactors, respectively.
400
Seed sludge
AT
MT
Seed sludge
AT
MT
350
300
300
250
250
OTUs
OTUs
350
400
0.88%
a
200
0.44%
150
b
0.66%
200
150
0.22%
0.22%
100
100
50
50
0
0
2000
4000
6000
Number of sequences
8000
0.11%
0
0
2000
4000
6000
8000
Number of sequences
Figure S4 Rarefaction curves of the seed sludge, ambient and mesophilic enrichments at
similarity cutoffs of 3% (a) and 6% (b). The rarefaction curve was computed using RDP
Pyrosequencing Pipeline Rarefaction Tool. The samples were arranged descendingly based
on the numbers of OTUs. The value above each curve is the slope at the end point of each
curve, which indicated the increase in the number of novel OTUs with the increase of every
100 sequences.
Figure S5 Shift in Phylum Proteobacteria before and after enrichment at ambient and
mesophilic temperatures. The percent in the bracket indicated the relative
abundance of Proteobacteria in total bacterial population of each sample.
Figure S6 Phylogenic trees of 16S rRNA gene sequences constructed for the most
abundant species detected in the AT (a) and MT (b) phenol-degrading
enrichments. The relevant sequences were aligned using ClustalW program
provided in MEGA5 package. The tree was constructed using the neighbor-joining
algorithm with Jukes-Cantor model (bootstrapping number =1000).
o
2048
AT (20 C)
o
MT(37 C)
Seed sludge
1024
Sequence abundce
512
256
128
64
32
16
8
4
0
4
8
12
16
20
24
28
32
36
40
44
Genus rank
Figure S7 Rank-abundance curves for bacterial genus in the seed sludge, AT and MT
reactors, respectively. The abundance is represented using sequences number that is
assigned to each genus in 454 data sets of 16S rRNA gene sequences.
Download