Supplementary Information (doc 140K)

advertisement
1
Supplementary Methods
2
Sampling, DNA extraction, and sequencing
3
Over 80 peat cores were collected in July 2012 from the Bog Lake fen (Fen, 47° 30' 22.62", -
4
93° 29' 20.46") and S1 bog (47° 30' 22.3914", -93° 27' 11.772") sites in the Marcell
5
Experimental Forest (MEF) in northern Minnesota, USA. Biogeochemical characterization
6
(microbial community composition, functional potential, and organic matter properties, etc.) of
7
these samples has been described elsewhere (Lin et al 2014a, Lin et al 2014b, Tfaily et al 2014).
8
Two 75-100 cm depth intervals from Bog Lake Fen and S1 bog T3M (a mid-point of the transect
9
#3) sites were selected for metagenomic sequencing because of the high proportion (up to 60%
10
of total community) of Archaea detected in these soils based on quantitative real time PCR and
11
amplicon sequencing data (Lin et al 2014b). The bog and fen sites differed in their vegetation
12
cover, porosity of peat column, organic matter properties, and thus microbial community
13
composition (Lin et al 2014b, Tfaily et al 2014). The bog is an acidic (pH 3.5-4.0) and nutrient-
14
deficient environment that receives water inputs primarily from precipitation. In contrast, Bog
15
Lake Fen (pH≈4.5-4.8) is a poor fen, which contains a higher coverage of vascular plants along
16
with Sphagnum. Core sections were homogenized and subsampled in sterile bags. Samples for
17
DNA extractions were frozen at -20oC within 2 hours of sampling and then transferred to a -80oC
18
freezer at the end of the sampling day.
19
Genomic DNA was extracted from triplicate peat soil subsamples using a MoBio PowerSoil
20
DNA extraction kit (MoBio, Carlsbad, CA) following the manufacturer’s protocol and using 0.5
21
g of peat per extraction. Extracted DNA samples were pooled. Libraries for metagenomic
22
sequencing were generated using the Nextera DNA sample preparation kit (Illumina, Inc. San
23
Diego, CA). Libraries were size-selected using E-Gels (Life Technologies, Inc.) for an insert size
24
range of 400-800 bp. Libraries were then quantified and quality checked using the Invitrogen
25
Qubit and Agilent Bioanalyzer. All libraries were pooled in an equimolar ratio and sequenced
26
using an Illumina HiSeq2000 instrument, generating paired-end reads of 150 bases in length.
27
Metagenome assembly, binning, and annotation
28
Low quality reads were trimmed using Sickle (v. 1.29) with a quality score threshold of Q=3
29
(Joshi NA and Fass JN, unpublished, https://github.com/najoshi/sickle), and then assembled
30
using IDBA_ud v. 1.0.9 (Peng et al 2012), with a min and max kmer size of 40 and 70,
31
respectively, and a 5 kmer step size. Contigs greater than 2 kb were retained for further analysis,
32
and taxonomic binning. The coverage of each contig was determined by mapping reads using
33
Bowtie v. 1.0.0 with default settings (Langmead et al 2009). The clustering of contigs into
34
genome bins was based on agreements among contig tetranucleotide frequency composition,
35
read coverage, G+C content, and predicted protein UBLAST (usearch v. 7.0.1001, Edgar RC,
36
unpublished, http://drive5.com/usearch/) hits to the UniProt UniRef90 database, following gene
37
prediction with Prodigal in metagenomics mode (Hyatt et al 2012). To aid binning, Emergent
38
Self Organizing Maps (ESOM) were created using Databionics ESOM tools (http://databionic-
39
esom.sourceforge.net/) and tetranucleotide frequencies predicted after fragmenting contigs into 5
40
kb lengths (Dick et al 2009). Contigs fragmented into 2 kb lengths were then projected onto to
41
these maps. Genome completeness was determined by searching for 31 bacterial single copy
42
marker genes or 104 archaeal single copy marker genes, using the ‘MarkerScanner.pl’ script of
43
AMPHORA2 (Wu and Scott 2012).
44
Each assembled genome was uploaded to the Integrated Microbial Genomes (IMG) system at
45
DOE's Joint Genome Institute (JGI) for annotation (Markowitz et al 2014). For comparison,
46
genome sequences were also uploaded to the RAST server and annotated by RAST v4.0
47
(Overbeek et al 2014). Metabolic reconstruction and pathway mapping of each assembled
48
genome was completed using the KEGG Automatic Annotation Server (KAAS) (Moriya et al
49
2007). Identification of glycoside hydrolases (GH) was performed by searching against the Pfam
50
database using HMMER v.3.0 (Finn et al 2014), including targets of GH families, 66
51
carbohydrate active enzymes, 34 carbohydrate binding modules, 3 polysaccharide lyases, and 5
52
carbohydrate esterases as listed in (Tveit et al 2013). Representative genes in fermentation
53
pathways were counted to determine the genomic potential for producing lactate, H2, ethanol,
54
propionate, CO2, acetate and butyrate by fermentation (Kirchman et al 2014).
55
Phylogenetic analysis of 16S rRNA genes and concatenated marker genes
56
Expectation maximization iterative reconstruction of genes from the environment (EMIRGE)
57
was used to reconstruct full-length 16S rRNA gene sequences from the metagenomic data
58
(Miller et al 2011), with sequence clustering at 97% identity. Taxonomy was determined by
59
searching representative sequences against a dereplicated version of the SILVA 108 databases
60
using SINA v1.2.11 (Miller et al 2011, Pruesse et al 2012). The 16S rRNA gene fragments
61
present in contigs were detected and retrieved by using Metaxa (Bengtsson et al 2011). 16S
62
rRNA gene sequences reconstructed by EMIRGE were matched to genomes by aligning 16S
63
rRNA gene sequences from both EMIRGE and Metaxa using SINA v1.2.11 (Pruesse et al 2012),
64
along with sequences from their closest relatives and sequences previously found in amplicon
65
sequencing from the same sample (Lin et al 2014b). MEGA6 (Tamura et al 2013) was used to
66
construct the Maximum Likelihood tree using the Kimura 2-parameter model for nucleotide
67
evolution, with a bootstrap test of 500 replication. The tree branches of Crenarchaeota and
68
Thaumarchaeota were clustered and collapsed into taxonomic groups according to the
69
crenarchaeotal taxonomic framework used in (Ochsenreiter et al 2003) and (Kubo et al 2012).
70
To construct a concatenated core gene tree for the two near complete genome bins, a distance
71
tree was first created based on 16S rRNA gene sequences from genomes of all Crenarchaeota,
72
Thaumarchaeota, and Thermoplasmata available in the IMG database (Markowitz et al 2014).
73
This 16S tree was used to identify the closest neighbors for the two near complete genomes.
74
Genome sequences of the two assembled genomes and their closest neighbors were then
75
downloaded and searched against 104 archaeal marker proteins using AMPHORA2. Marker
76
proteins identified in each downloaded genome were aligned and concatenated after removing
77
unaligned fragments. All aligned and concatenated protein sequences of each genome were then
78
used to construct the concatenated core gene tree using FastTree v2.1.3 with default parameters
79
(Price et al 2010).
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
Supplementary references
Bengtsson J, Eriksson KM, Hartmann M, Wang Z, Shenoy BD, Grelet GA et al (2011). Metaxa:
a software tool for automated detection and discrimination among ribosomal small subunit
(12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in
metagenomes and environmental sequencing datasets. Antonie van Leeuwenhoek 100: 471-475.
Dick GJ, Andersson AF, Baker BJ, Simmons SL, Thomas BC, Yelton AP et al (2009).
Community-wide analysis of microbial genome sequence signatures. Genome Biol 10: R85.
Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR et al (2014). Pfam: the
protein families database. Nucleic Acids Res 42: D222-230.
Hyatt D, LoCascio PF, Hauser LJ, Uberbacher EC (2012). Gene and translation initiation site
prediction in metagenomic sequences. Bioinformatics 28: 2223-2230.
Kirchman DL, Hanson TE, Cottrell MT, Hamdan LJ (2014). Metagenomic analysis of organic
matter degradation in methane-rich Arctic Ocean sediments. Limnol Oceanogr 59: 548-559.
Kubo K, Lloyd KG, J FB, Amann R, Teske A, Knittel K (2012). Archaea of the Miscellaneous
Crenarchaeotal Group are abundant, diverse and widespread in marine sediments. Isme J 6:
1949-1965.
Langmead B, Trapnell C, Pop M, Salzberg SL (2009). Ultrafast and memory-efficient alignment
of short DNA sequences to the human genome. Genome Biology 10.
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
Lin X, Tfaily MM, Green SJ, Steinweg JM, Chanton P, Imvittaya A et al (2014a). Microbial
metabolic potential for carbon degradation and nutrient (nitrogen and phosphorus) acquisition in
an ombrotrophic peatland. Appl Environ Microbiol 80: 3531-3540.
Lin X, Tfaily MM, Steinweg JM, Chanton P, Esson K, Yang ZK et al (2014b). Microbial
community stratification linked to utilization of carbohydrates and phosphorus limitation in a
boreal peatland at Marcell Experimental Forest, Minnesota, USA. Appl Environ Microbiol 80:
3518-3530.
Markowitz VM, Chen IM, Palaniappan K, Chu K, Szeto E, Pillay M et al (2014). IMG 4 version
of the integrated microbial genomes comparative analysis system. Nucleic Acids Res 42: D560567.
Miller CS, Baker BJ, Thomas BC, Singer SW, Banfield JF (2011). EMIRGE: reconstruction of
full-length ribosomal genes from microbial community short read sequencing data. Genome Biol
12: R44.
Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M (2007). KAAS: an automatic genome
annotation and pathway reconstruction server. Nucleic Acids Res 35: W182-185.
Ochsenreiter T, Selezi D, Quaiser A, Bonch-Osmolovskaya L, Schleper C (2003). Diversity and
abundance of Crenarchaeota in terrestrial habitats studied by 16S RNA surveys and real time
PCR. Environmental Microbiology 5: 787-797.
Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T et al (2014). The SEED and the
Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids
Res 42: D206-214.
Peng Y, Leung HCM, Yiu SM, Chin FYL (2012). IDBA-UD: a de novo assembler for single-cell
and metagenomic sequencing data with highly uneven depth. Bioinformatics 28: 1420-1428.
Price MN, Dehal PS, Arkin AP (2010). FastTree 2--approximately maximum-likelihood trees for
large alignments. Plos One 5: e9490.
Pruesse E, Peplies J, Glockner FO (2012). SINA: accurate high-throughput multiple sequence
alignment of ribosomal RNA genes. Bioinformatics 28: 1823-1829.
Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013). MEGA6: Molecular
Evolutionary Genetics Analysis version 6.0. Mol Biol Evol 30: 2725-2729.
Tfaily MM, Cooper WT, Kostka JE, Chanton PR, Schadt CW, Hanson PJ et al (2014). Organic
matter transformation in the peat column at Marcell Experimental Forest: Humification and
vertical stratification. J Geophys Res Biogeosci 119: 661-675.
Tveit A, Schwacke R, Svenning MM, Urich T (2013). Organic carbon transformations in high-
151
152
153
154
155
156
Arctic peat soils: key functions and microorganisms. Isme J 7: 299-311.
Wu M, Scott AJ (2012). Phylogenomic analysis of bacterial and archaeal sequences with
AMPHORA2. Bioinformatics 28: 1033-1034.
Download