Supplementary Information (docx 2340K)

advertisement
1
Genomic and phenotypic differentiation among Methanosarcina mazei
2
populations from Columbia River sediment
3
4
Nicholas D. Youngblut1,2, Joseph S. Wirth1,3, James R. Henriksen1,4 , Maria Smith5,
5
Holly Simon5, William W. Metcalf1, Rachel J. Whitaker1,*
6
1.
7
8
Avenue, Urbana, IL 61801, USA
2.
9
10
Department of Microbiology, University of Illinois at Urbana-Champaign, 601 South Goodwin
Currently at: Department of Crop and Soil Sciences, Cornell University, 306 Tower Road,
Ithaca, NY 14853, USA
3.
11
Currently at: Department of Microbiology, University of Georgia, 120 Cedar Street, Athens,
GA 30602, USA
12
4.
Currently at: AgBiome, PO Box 14069, Research Triangle Park, NC 27709, USA
13
5.
Institute of Environmental Health; Division of Environmental and Biomolecular Systems,
14
Oregon Health and Science University, 3251 S.W. Jackson Park Rd, Portland, OR 97239, USA
15
16
17
Supplemental Materials and Methods
18
Description of sites and sediments
19
Adjacent to the main channel of the Columbia River Estuary, Baker Bay and
20
Youngs Bay are believed to be important sites for nutrient transformations by
21
sediment microbial communities. The sediment microbiota is thought to contribute
22
to net ecosystem metabolism, in part, by producing metabolites that feed into the
23
‘microbial loop’ in the mainstem estuary. Alternatively, these metabolites may also
24
be transported to coastal waters in the river plume (Gilbert et al., 2013). Sediment
25
microbial communities develop and evolve in the context of continuous material
1
26
exchanges with the water column, including particles, organic matter, reduced
27
substrates, electron acceptors, and respiratory gases (Cai et al., 1999; Turner and
28
Millward, 2002). Water and sediments in the Columbia River Estuary are routinely
29
exposed to dynamic shifts in end-member forcing with changes in season and the
30
tidal cycle. A number of resources are therefore used to analyze the properties of
31
the estuarine water column and/or sediments, including: i) historical data from
32
previous Columbia River monitoring projects (Simenstad et al., 1990; Sherwood et
33
al., 1990; Simenstad et al., 1984; Smith et al., 2010); ii) SATURN endurance stations
34
(Gilbert et al., 2013); and iii) measurements obtained during several shore sampling
35
campaigns in 2012-2014 (Smith et al., in prep, Lydie Herfort, personal
36
communication).
37
The lower Columbia River Estuary adjacent to the lateral bays displays the
38
full range of water salinities between 0 and 32 PSU (Smith et al., 2010). Youngs Bay
39
is located at the confluence of three rivers, with salinities at the mouth highly
40
dependent on river flow (which varies seasonally). Salinities observed in Youngs
41
Bay at the time of sampling (late summer) range from 0 to > 20 PSU (reviewed in
42
Simenstad et al., 1984). Our near-shore measurements of salinity at the YBM site
43
were between 2 and 6, and from 0 to 5 PSU at the YBB site (Smith et al., in prep;
44
Lydie Herfort, personal communication). Although the YBM site was about 4 km
45
closer than YBB to the mouth of Youngs Bay, sediments from the two sites were
46
similar in characteristics with respect to class (silt loam), total nutrients (NH4+,
47
~160 ppm; NO3-, ~ 3 ppm) and major metal content (Fe and Mn were each in the
48
200 ppm range). Differences were observed, however, in pH (6.6 vs. 6.3 for YBM and
2
49
YBB, respectively), and total phosphorous (6 ppm for YBM vs. 15 ppm for YBB).
50
Organic matter content was also somewhat higher at the YBB site, at 8.8%, while
51
YBM sediments contained 5.4%.
52
Based on historical data, Baker Bay is described as highly influenced by tidal
53
forcing, with salinities at mid-depth during low river flow in the summer/fall
54
seasons ranging from 6.4 to more than 32 PSU (Simenstad et al., 1984). Our near-
55
shore measurements of salinity at the sampling site produced values ranging from 4
56
to 16 PSU (Smith et al., in prep; Lydie Herfort, personal communication). The pH was
57
higher (7.5) in the sandy loam sediments from the Baker Bay site compared to the
58
Youngs Bay sites. Ammonium was about 10-times lower (15.5 ppm), and total Fe
59
was about half (130 ppm) that measured at the Youngs Bay sites. Organic matter
60
was also lowest (1.8%) at the Baker Bay site, while total phosphorous (17 ppm) was
61
similar to that in YBM sediments.
62
63
64
Sample collection
On August 22, 2011 during low tide, sediment samples were collected near
65
the shore at three locations in Youngs Bay and Baker Bay in the Columbia River
66
Estuary as described in (Smith et al. in prep). Samples were collected in sterile 50 ml
67
Corning tubes and stored on ice until processed.
68
69
70
71
Nucleic acid extraction and mcrA amplicon 454-pyrosequencing
DNA was extracted from approximately 1 g from each sediment sample with
the PowerSoil DNA Isolate Kit using the standard protocol (MoBio, Carlsbad, CA).
3
72
Adapters and barcodes were added to the methyl coenzyme M reductase subunit A
73
(mcrA) specific primers mcrF and mcrR (Luton et al., 2002) for multiplexed 454
74
pyrosequencing. The gene fragment was amplified by PCR with a final volume of 30
75
μl containing the final concentrations of 0.2 mM dNTPs (each), 0.5 μM primers
76
(each), and 0.03 U of Phusion DNA Polymerase F-530 (Finnzymes, MA, USA).
77
Thermocycler conditions consisted of an initial denaturation for 3 minutes at 98°C,
78
followed by 30 cycles of 30 seconds at 98°C, 15 seconds at 59°C, and 15 seconds at
79
72°C, with a final extension of 10 minutes at 72°C. Triplicate PCR reactions were
80
pooled, and gel bands of the expected amplicon size were excised and purified with
81
the Wizard DNA Purification Kit (Promega, Madison, WI). Purified amplicons were
82
submitted to the WM Keck Center for Comparative and Functional Genomics at the
83
University of Illinois at Urbana-Champaign for 454 pyrosequencing on a 454
84
GSFLX+ Sequencer (Roche, Branford, CT).
85
86
87
mcrA sequence analysis
Mothur v1.24.0 (Schloss et al., 2009) was used for 454 pyrosequencing read
88
barcode and primer removal along with sequence quality filtering. Sequences that
89
were <200 bp in length, contained homopolymers >6 bp long, had >1 error in the
90
barcode, or >1 error in the primer were discarded. Chimeric sequences were
91
identified and removed with the Mothur implementation of Uchime (Edgar et al.,
92
2011). Combined, quality filtering removed 12.5% (8898 of 71097) of the
93
sequences. Sequences were clustered into 295 operational taxonomic units (OTUs)
94
at a 95% sequence identity cutoff with CD-HIT-454 v4.6 (Fu et al., 2012). A
4
95
reference mcrA dataset was constructed from select mcrA sequence fragments of
96
cultured methanogens in the Functional Gene pipeline and repository (FunGene;
97
http://fungene.cme.msu.edu), mcrA gene sequences from each sequenced isolate in
98
this study, and all sequenced Methanosarcina genomes. Amino acid sequences of the
99
reference mcrA dataset were aligned with mafft v7.037b (Katoh and Standley, 2013)
100
and then reverse-translated with PAL2NAL v14 (Suyama et al., 2006). A maximum
101
likelihood phylogeny was inferred from nucleotide alignment with RAxML v7.2.6
102
(GTR-Γ model; 100 bootstrap replicates) (Stamatakis, 2006). Representative
103
sequences for each environmental mcrA OTU were inserted into the reference
104
phylogeny using RAxML. The number of sequences within each OTU at their sample
105
origin was mapped onto the tree with iTOL v2 (Letunic and Bork, 2011).
106
107
Culture isolation
108
Direct plating with agar overlays under strictly anaerobic conditions was
109
used for initial Methanosarcina strain cultivation. Three sediment dilutions (100, 10-
110
1,
111
or freshwater PIPES-buffered media consisting of 1 μM KPO4, 10 μM NH4Cl, 4 μM
112
resazurin, 40 mM PIPES buffer, 1:100 trace elements solution, 1:100 vitamin
113
solution, and 1X base salts. The trace element solution consisted of 5.8 mM
114
N(CH2CO2H)3, 2 mM Fe(NH4)2(SO4)2, 1.1 mM Na2SeO3, 0.4 mM CoCl26H2O, 0.6 mM
115
MnSO4H2O, 0.4 mM Na2MoO42H2O, 0.3 mM Na2WO42H2O, 0.3 mM ZnSO47H2O,
116
0.4 mM NiCl26H2O, 0.16 mM H3BO3, 40 μM CuSO45H2O. The vitamin solution
117
consisted of 73 μM p-aminobenzoic acid, 81 μM nicotinic acid, 42 μM calcium
and 10-2) were plated on bicarbonate-buffered marine media (Metcalf et al., 1996)
5
118
pantothenate, 49 μM pyridoxine HCl, 27 μM riboflavin, 30 μM thiamine HCl, 20 μM
119
biotin, 11 μM folic acid, 24 μM α-lipoic acid, and 3.7 μM vitamin B12. The base salts
120
consisted of 342 mM NaCl, 14.8 mM MgCl26H2O, 1 mM CaCl22H2O, and 6.71 mM
121
KCl. Isolated colonies picked from the direct plating were subjected to 1-3 rounds of
122
restreaking.
123
124
125
Genomic sequencing and assembly
Genomic DNA extracted from each culture using the UltraClean Microbial
126
DNA Isolation Kit (MoBio, Carlsbad, CA). Multiplexed libraries were prepared using
127
the Nextera XT DNA Sample Prep Kit (lllumina, San Diego, CA) without performing
128
the bead normalization step. Instead, the libraries were quantified with a Qubit
129
fluorometer (Life Technologies, Carlsbad, CA) and normalized by dilution with
130
molecular grade water. Normalized libraries were pooled and submitted to the WM
131
Keck Center for Comparative and Functional Genomics at the University of Illinois at
132
Urbana-Champaign for paired-end sequencing with a HiSeq2000 sequencer
133
(Illumina, San Diego, CA).
134
Our genome assembly pipeline was optimized through extensive testing by
135
comparing draft assemblies produced from just Illumina HiSeq 2000 paired-end
136
reads from the reference stains M. barkeri Fusaro (Maeder et al., 2006), M. mazei
137
WWM610, M. mazei C16 (Blotevogel et al., 1986), and M. mazei LYC (Liu et al., 1985)
138
versus the closed versions of these genomes, which had been assembled with
139
multiple sequencing methods including paired-end 454 pyrosequencing data,
140
cosmid paired-end reads, and Sanger sequencing to fill gaps. By this benchmarking,
6
141
we selected parameters that foremost increased assembly accuracy and secondarily
142
increased assembly contiguity.
143
We also used this benchmark dataset to assess likely causes of the draft
144
assembly breakpoints. To this end, we mapped each draft genome to the closed
145
version with ABACAS v1.3.1 (Assefa et al., 2009) to identify alignment gaps (i.e.,
146
assembly breakpoints). We identified genomic elements that were located at the gap
147
edges (≤100 bp from a gap start or end) and may have caused the assembly
148
breakpoint.
149
The paired-end reads were quality-filtered with the FASTX Toolkit v0.0.13
150
using a q-value cutoff of 30 over 95% of the read length. Filtered reads were
151
randomly subsampled to one million read pairs per sample (~40-50X coverage),
152
which we found to provide optimal assembly accuracy and contiguity based on our
153
benchmark dataset. Genomic assembly and scaffolding was performed with a
154
modified version of the A5 assembly pipeline (Tritt et al., 2012), in which IDBA-UD
155
v1.1.0 (Peng et al., 2012) was used instead of IDBA for the actual assembly. This
156
assembly method consistently increased assembly contiguity. BLASTn was used to
157
identify and remove scaffolds potentially containing contamination (e.g., regions
158
with a high number of hits to E. coli; E-value <1e-20). The percentage of total
159
scaffold length of any assembly that was identified as contamination and removed
160
varied from 0-3%. Gaps in scaffolds were filled in silico with GapFiller (Boetzer and
161
Pirovano, 2012), with an average of 68% of gaps closed per assembly. Sequel v1.0.1
162
(Ronen et al., 2012) was used to correct on average 39 base miscalls and/or
163
erroneous indels in each assembly. OASIS (Robinson et al., 2012) was used to
7
164
identify putative insertion sequences, and IslandViewer (Langille and Brinkman,
165
2009) was used to identify putative genomic islands based on multiple sequence
166
composition criteria. CRISPRs were identified with CRISPRFinder (Grissa et al.,
167
2007) and classified in accordance with Vestergaard and colleagues by assessing
168
homology (BLASTp, E-value < 1e-20) to all annotated cas genes in the study’s
169
dataset (Vestergaard et al., 2014). cas genes were manually annotated by searching
170
the NCBI non-redundant protein database via BLASTp and searching the protein
171
family databases CDD (Marchler-Bauer et al., 2005), Pfam (Finn et al., 2008), and
172
COG (Tatusov et al., 2000) with HHsearch (Söding, 2005). CRISPR spacer content
173
conservation was assessed by pairwise alignments of all CRISPRs classified as the
174
same subtype. For the alignments, each CRISPR was represented as a string; with
175
each unique spacer nucleotide sequence represented a unique character in the
176
string. The CRISPR strings were oriented by the putative leader regions of each
177
CRISPR and aligned pairwise with a Levenshtein distance algorithm implemented in
178
Perl. Matched spacers in the alignment (i.e., the same character in the CRISPR string
179
representation) received a score of 1, while mismatches were scored as 0. Leader
180
regions were identified by sequence conversation of the leader region and direct
181
repeat sequence conservation.
182
183
184
Whole genome alignemnt
A whole genome alignment (WGA) of isolate genomes identified as M. mazei
185
and all reference M. mazei genomes was created with mugsy v1.2.3 (Angiuoli and
186
Salzberg, 2011). RAxML was used to infer a ‘species’ tree (GTR-Γ model; 100
8
187
bootstrap replicates) from all ‘core’ (found in all taxa) local collinear blocks (LCBs)
188
in the WGA. The M. mazei reference strain genomes were aligned with
189
progressiveMauve v2.3.1 (Darling et al., 2010). Mauve v2.3.1 (Darling et al., 2004)
190
was used to visualize the alignment and calculated double-cut-and-join (DCJ)
191
distances.
192
193
Core and variable gene analysis
194
Genes were called and annotated using the Rapid Annotation using
195
Subsystem Technology (RAST) server (Aziz et al., 2008). The ITEP toolkit (Benedict
196
et al., 2014) was used to group genes from all isolates (or isolates and type strains)
197
into putative homologs through Markov Chain Clustering (via the MCL program) of
198
BLASTp maximum bitscore ratios (0.4 cutoff, 2.0 inflation parameter) (Enright et al.,
199
2002).
200
Amino acid sequences of gene clusters were aligned with mafft v7.037b
201
(Katoh and Standley, 2013) and the reverse-translated with PAL2NAL v14 (Suyama
202
et al., 2006). Poor alignments caused by artificial gene truncations due to
203
incomplete genome assembly were identified and removed using a custom Perl
204
script that identified sequences in alignments where half of the aligned sequence
205
was an outlier (±1 standard deviation) in terms of mean sequence identity and
206
number of gaps and was also within 300 bp of a contig end. Maximum likelihood
207
phylogenies were inferred from the nucleotide alignments with RAxML v7.2.6 (GTR-
208
Γ model; 100 bootstrap replicates) (Stamatakis, 2006).
9
209
To estimate the number of gene clusters missing from any particular draft
210
genome, we assessed the number of gene clusters missing from the draft assemblies
211
compared to the complete assemblies of our genomes (M. barkeri Fusaro, M. mazei
212
WWM610, M. mazei C16, M. mazei LYC; see above). We found that <2% of coding
213
sequences were missing when comparing the draft assemblies to their
214
corresponding closed genomes, indicating a limited impact of artificial gene absence
215
on our analysis. Still, to account for this possibility and to mitigate errors caused by
216
artificial gene loss, we defined genes specific to the mazei-WC or mazei-T clade as
217
those found in the majority of strains in one clade but absent from the other.
218
Quantification of dN/dS, mean sequence identity, and FST values for the core
219
genes was performed with SNAP, Mothur v1.24.0, and Arlecore v3.5.1.3, respectively
220
(Korber, 2000; Schloss et al., 2009; Excoffier and Lischer, 2010).
221
We assessed inter-clade recombination of core genes using two general
222
methods: tree-reconciliation and quartet decomposition. The former was performed
223
with Mowgli (Nguyen et al., 2013), which infers recombination through identifying
224
incongruences between the ‘species’ tree (the WGA phylogeny) and a gene tree
225
(inferred for each core gene), while accounting for unsupported nodes by
226
performing nearest neighbor interchange (NNI) operations to minimize false
227
inferences of gene transfer and duplication. We found that poorly supported nodes
228
in either the species or gene trees, as often occur among highly similar sequences,
229
greatly inflated the number of inferred gene transfers for both methods (data not
230
shown). To mitigate this artifact, we only used gene trees with a bootstrap support
231
of >50 and a clone-corrected species tree, where all approximately clonal taxa were
10
232
collapsed to one representative taxon. However, most gene trees did not meet our
233
criterion. Therefore, we also employed quartet decomposition with the Quartet
234
Decomposition Server (Mao et al., 2012) to identify individual quartets in gene trees
235
that had high bootstrap support but were incongruent with the bifurcation of mazei-
236
T and mazei-WC. For this analysis, we only assessed gene trees possessing quartets
237
with two members from both mazei-T and mazei-WC and ≥1 SNP segregating the
238
internal nodes of the quartet.
239
240
241
Statistics and plotting
All statistical evaluations were performed in R (R Development Core Team,
242
2010). The circular genome plots were created with Circos (Krzywinski et al., 2009),
243
and all other plots were produced with R using the ggplot2 package (Wickham,
244
2009). All phylogenies were visualized with either iTOL v2 or FigTree v1.4.0
245
(Letunic and Bork, 2011). All custom Perl scripts used in this study are available at
246
https://github.com/nyoungb2/pop_genome, with many relying on the Bioperl
247
Toolkit (Stajich et al., 2002).
248
249
250
Methane production assays
Methane production, as proxy for culture growth, was monitored using a
251
Hewlett Packard 5890 Series II gas chromatograph (Hewlett-Packard, Wilmington,
252
DE) with a flame ionization detector and a column of stainless steel filled with
253
80/120 Carbopack B/3% SP-1500 (Supelco, Bellefonte, PA) heated to 225°C. The
254
maximum growth rates inferred from the resulting methane production curves
11
255
were compared among isolates. Depending on the culture and substrate, stationary
256
phase was reached between ~500 and ~1100 hours. The low-throughput nature of
257
this method limited the number of cultures that can be compared in the same
258
experiment. In addition, cultures were originally isolated on different media
259
(freshwater and marine) and substrates (trimethylamine, methanol, and acetate),
260
which could be confounding factors. To control for this, we performed direct
261
pairwise comparisons between isolates from different clades isolated on the same
262
media and substrate when possible. Two to four isolates were compared in any
263
given round of methane production monitoring. Each isolate was grown in its
264
‘native’ medium (marine or freshwater) in triplicate or quadruplicate. In addition,
265
we found no growth in cultures inoculated in media lacking substrate and balch
266
tubes containing all substrates but lacking inoculum.
12
267
Supplemental References
268
269
Angiuoli SV, Salzberg SL. (2011). Mugsy: fast multiple alignment of closely related
whole genomes. Bioinformatics 27:334–342.
270
271
272
Assefa S, Keane TM, Otto TD, Newbold C, Berriman M. (2009). ABACAS: algorithmbased automatic contiguation of assembled sequences. Bioinforma Oxf Engl
25:1968–1969.
273
274
Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. (2008). The RAST
Server: Rapid Annotations using Subsystems Technology. BMC Genomics 9:75.
275
276
Benedict MN, Henriksen JR, Metcalf WW, Whitaker RJ, Price ND. (2014). ITEP: An
integrated toolkit for exploration of microbial pan-genomes. BMC Genomics 15:8.
277
278
Blotevogel KH, Fischer U, Lüpkes KH. (1986). Methanococcus frisius sp.nov., a new
methylotrophic marine methanogen. Can J Microbiol 32:127–131.
279
280
Boetzer M, Pirovano W. (2012). Toward almost closed genomes with GapFiller.
Genome Biol 13:R56.
281
282
283
Cai W-J, Pomeroy LR, Moran MA, Wang Y. (1999). Oxygen and carbon dioxide mass
balance for the estuarine-intertidal marsh complex of five rivers in the southeastern
U.S. Limnol Oceanogr 44:639–649.
284
285
Darling ACE, Mau B, Blattner FR, Perna NT. (2004). Mauve: multiple alignment of
conserved genomic sequence with rearrangements. Genome Res 14:1394–1403.
286
287
Darling AE, Mau B, Perna NT. (2010). progressiveMauve: Multiple Genome
Alignment with Gene Gain, Loss and Rearrangement. PLoS ONE 5:e11147.
288
289
Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R. (2011). UCHIME improves
sensitivity and speed of chimera detection. Bioinformatics 27:2194–2200.
290
291
Enright AJ, Van Dongen S, Ouzounis CA. (2002). An efficient algorithm for large-scale
detection of protein families. Nucleic Acids Res 30:1575–1584.
292
293
294
Excoffier L, Lischer HEL. (2010). Arlequin suite ver 3.5: a new series of programs to
perform population genetics analyses under Linux and Windows. Mol Ecol Resour
10:564–567.
295
296
Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz H-R, et al. (2008). The Pfam
protein families database. Nucleic Acids Res 36:D281–288.
297
298
Fu L, Niu B, Zhu Z, Wu S, Li W. (2012). CD-HIT: accelerated for clustering the nextgeneration sequencing data. Bioinforma Oxf Engl 28:3150–3152.
13
299
300
301
Gilbert M, Needoba J, Koch C, Barnard A, Baptista A. (2013). Nutrient Loading and
Transformations in the Columbia River Estuary Determined by High-Resolution In
Situ Sensors. Estuaries Coasts 36:708–727.
302
303
304
Grissa I, Vergnaud G, Pourcel C. (2007). CRISPRFinder: a web tool to identify
clustered regularly interspaced short palindromic repeats. Nucleic Acids Res
35:W52–W57.
305
306
Katoh K, Standley DM. (2013). MAFFT multiple sequence alignment software
version 7: improvements in performance and usability. Mol Biol Evol 30:772–780.
307
308
Korber B. (2000). HIV signature and sequence variation analysis. Comput Anal HIV
Mol Seq 4:55–72.
309
310
311
Krätzer C, Carini P, Hovey R, Deppenmeier U. (2009). Transcriptional Profiling of
Methyltransferase Genes during Growth of Methanosarcina mazei on
Trimethylamine. J Bacteriol 191:5108–5115.
312
313
314
Krzywinski M, Schein J, Birol İ, Connors J, Gascoyne R, Horsman D, et al. (2009).
Circos: an information aesthetic for comparative genomics. Genome Res 19:1639–
1645.
315
316
317
Langille MGI, Brinkman FSL. (2009). IslandViewer: an integrated interface for
computational identification and visualization of genomic islands. Bioinformatics
25:664–665.
318
319
Letunic I, Bork P. (2011). Interactive Tree Of Life v2: online annotation and display
of phylogenetic trees made easy. Nucleic Acids Res 39:W475–W478.
320
321
322
Liu Y, Boone DR, Sleat R, Mah RA. (1985). Methanosarcina mazei LYC, a New
Methanogenic Isolate Which Produces a Disaggregating Enzyme. Appl Environ
Microbiol 49:608–613.
323
324
325
Luton PE, Wayne JM, Sharp RJ, Riley PW. (2002). The mcrA gene as an alternative to
16S rRNA in the phylogenetic analysis of methanogen populations in landfill.
Microbiology 148:3521–3530.
326
327
328
329
Maeder DL, Anderson I, Brettin TS, Bruce DC, Gilna P, Han CS, et al. (2006). The
Methanosarcina barkeri Genome: Comparative Analysis with Methanosarcina
acetivorans and Methanosarcina mazei Reveals Extensive Rearrangement within
Methanosarcinal Genomes. J Bacteriol 188:7922–7931.
330
331
332
Mao F, Williams D, Zhaxybayeva O, Poptsova M, Lapierre P, Gogarten JP, et al.
(2012). Quartet decomposition server: a platform for analyzing phylogenetic trees.
BMC Bioinformatics 13:123.
14
333
334
335
Marchler-Bauer A, Anderson JB, Cherukuri PF, DeWeese-Scott C, Geer LY, Gwadz M,
et al. (2005). CDD: a Conserved Domain Database for protein classification. Nucleic
Acids Res 33:D192–D196.
336
337
338
Metcalf WW, Zhang JK, Shi X, Wolfe RS. (1996). Molecular, genetic, and biochemical
characterization of the serC gene of Methanosarcina barkeri Fusaro. J Bacteriol
178:5797–5802.
339
340
341
Nguyen TH, Ranwez V, Pointet S, Chifolleau A-MA, Doyon J-P, Berry V. (2013).
Reconciliation and local gene tree rearrangement can be of mutual profit. Algorithms
Mol Biol 8:12.
342
343
344
Peng Y, Leung HC, Yiu S-M, Chin FY. (2012). IDBA-UD: a de novo assembler for
single-cell and metagenomic sequencing data with highly uneven depth.
Bioinformatics 28:1420–1428.
345
346
R Development Core Team. (2010). R: A Language and Environment for Statistical
Computing. Vienna, Austria http://www.R-project.org.
347
348
349
Robinson DG, Lee M-C, Marx CJ. (2012). OASIS: an automated program for global
investigation of bacterial and archaeal insertion sequences. Nucleic Acids Res
40:e174–e174.
350
351
Ronen R, Boucher C, Chitsaz H, Pevzner P. (2012). SEQuel: improving the accuracy of
genome assemblies. Bioinformatics 28:i188–i196.
352
353
354
355
Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. (2009).
Introducing mothur: Open-Source, Platform-Independent, Community-Supported
Software for Describing and Comparing Microbial Communities. Appl Environ
Microbiol 75:7537–7541.
356
357
Sherwood CR, Jay DA, Bradford Harvey R, Hamilton P, Simenstad CA. (1990).
Historical changes in the Columbia River Estuary. Prog Oceanogr 25:299–352.
358
359
360
Simenstad CA, Jay DA, McIntire D, Nehlsen W, Sherwood C. (1984). The dynamics of
the Columbia River estuarine ecosystem. Columbia River Estuary Data Development
Program. Portland, Oregon.
361
362
363
Simenstad CA, Small LF, David McIntire C, Jay DA, Sherwood C. (1990). Columbia
river estuary studies: An introduction to the estuary, a brief history, and prior
studies. Prog Oceanogr 25:1–13.
364
365
366
Smith M, Davis R, Youngblut N, Whitaker R, Metcalf W, Herfort L, et al. Metagenomic
evidence for reciprocal particle exchange between the Columbia River estuarine
water column and lateral bay sediments. Prep.
15
367
368
369
Smith MW, Herfort L, Tyrol K, Suciu D, Campbell V, Crump BC, et al. (2010). Seasonal
Changes in Bacterial and Archaeal Gene Expression Patterns across Salinity
Gradients in the Columbia River Coastal Margin. PLoS ONE 5:e13312.
370
371
Söding J. (2005). Protein homology detection by HMM–HMM comparison.
Bioinformatics 21:951–960.
372
373
Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, et al. (2002). The
Bioperl Toolkit: Perl Modules for the Life Sciences. Genome Res 12:1611–1618.
374
375
376
Stamatakis A. (2006). RAxML-VI-HPC: maximum likelihood-based phylogenetic
analyses with thousands of taxa and mixed models. Bioinforma Oxf Engl 22:2688–
2690.
377
378
379
Suyama M, Torrents D, Bork P. (2006). PAL2NAL: robust conversion of protein
sequence alignments into the corresponding codon alignments. Nucleic Acids Res
34:W609–W612.
380
381
382
Tatusov RL, Galperin MY, Natale DA, Koonin EV. (2000). The COG database: a tool for
genome-scale analysis of protein functions and evolution. Nucleic Acids Res 28:33–
36.
383
384
Tritt A, Eisen JA, Facciotti MT, Darling AE. (2012). An Integrated Pipeline for de Novo
Assembly of Microbial Genomes. PLoS ONE 7:e42304.
385
386
Turner A, Millward GE. (2002). Suspended Particles: Their Role in Estuarine
Biogeochemical Cycles. Estuar Coast Shelf Sci 55:857–883.
387
388
Vestergaard G, Garrett RA, Shah SA. (2014). CRISPR adaptive immune systems of
Archaea. RNA Biol 11:156–167.
389
Wickham H. (2009). ggplot2: elegant graphics for data analysis. Springer New York.
390
16
Supplemental Tables and Figures
1
2
Supplemental Figure 1. The phylogenetic tree is a maximum likelihood inference
(GTR-Γ model; 100 bootstrap replicates; rooted on Methanocaldococcus jannaschii
DSM 2661) of full-length mcrA alleles from all isolates and select type strains. mcrA
amplicon OTUs (95% sequence identity cutoff) were inserted into the tree. Red
circles highlight nodes that have bootstrap values >70. Bootstrapping only applies
to full-length mcrA alleles. The bar plot describes the number of sequences in each
OTU or the number of isolates with the same mcrA allele. Only OTUs with a total
abundance of ≥50 are shown. ‘*’ refers to the M. mazei isolates used for the
population genomics comparisons. ‘**’ refers to the isolates used as part of the
dN/dS analyses. Supplemental Table 9 lists the accession numbers of all reference
strains used for the inference.
3
Supplemental Figure 2. Distribution of isolates obtained on all sample-mediasubstrate pairwise combinations. ‘Initial cultures’ refers to the number of cultures
that grew in liquid media following initial colony picking. ‘Cultures sequenced’
refers to the cultures that were selected for genomic sequencing.
4
SarPi
LYC
Go1
100
WWM610
S6
100
C16
100
TMA
100
1.0 (DCJ distance)
0.001 Substitutions / site
Supplemental Figure 3. A whole genome alignment (WGA) of all seven closed
Methanosarcina mazei genomes. Colored regions are local co-linear blocks (LCBs),
which are regions lacking rearrangement of homologous sequence. Identical colors
and connecting lines identify LCBs found in multiple genomes. The height of the
colored bars within blocks describes sequence identity of the genome region, with
higher bars indicating higher sequence identity. The left dendrogram is an ML tree
(GTR-Γ model; 100 bootstrap replicates) inferred from the WGA. The right
dendrogram was produced by hierarchical clustering (average neighbor algorithm)
of double-cut-and-join distance values, which is a measure of genome synteny.
5
Mean spacer content identity
0.7
●
0.6
●
●
●
0.5
●
●
0.4
●
●
●
(0.901,1]
(0.801,0.901]
(0.702,0.801]
(0.602,0.702]
(0.503,0.602]
(0.403,0.503]
(0.304,0.403]
(0.204,0.304]
(0.105,0.204]
(0.00447,0.105]
●
Relative alignment position
Supplemental Figure 4. M. mazei CRISPRs display more CRISPR spacer content
variation at the leader end versus the trailer end. Alignment position is normalized
by CRISPR length (number of spacers) and is relative to the leader end, with 0 being
most proximal. Mean spacer content identity is the number of matching spacers
(same unique nucleotide sequence) normalized by the number of spacers in each
pairwise alignment of CRISPRs (truncated to the shortest CRISPR in the alignment).
Mean spacer content identity was calculated separately for each of the 10 relative
alignment position bins. The line ranges represent the standard error.
6
Supplemental Figure 5. Unrooted maximum likelihood phylogenies of all HdrA,
HdrB, and HdrC homologs for all Methanosarcinales and Methanocella strains. Red
and blue branches denote clades of genes solely found within Methanosarcina or
Methanocella, respectively.
7
Supplemental Figure 6. Unrooted maximum likelihood phylogenies (GTR-Γ model;
100 bootstrap replicates) of all FdhA and FdhB homologs.
8
Isolate ID
BB.F.A.2.3
BB.F.A.2.4
BB.F.T.0.2
BB.F.T.2.6
YBB.F.A.1A.1
YBB.F.A.1A.3
YBB.F.A.1B.1
YBB.F.A.2.12
YBB.F.A.2.3
YBB.F.A.2.5
YBB.F.A.2.6
YBB.F.A.2.7
YBB.F.T.1A.1
YBB.F.T.1A.2
YBB.F.T.1A.4
YBB.F.T.2.1
YBB.H.A.1A.1
YBB.H.A.1A.2
YBB.H.A.2.1
YBB.H.A.2.4
YBB.H.A.2.5
YBB.H.A.2.6
YBB.H.A.2.8
YBB.H.M.1A.1
YBB.H.M.1B.1
YBB.H.M.1B.2
YBB.H.M.1B.5
YBB.H.M.2.7
YBB.H.T.1A.1
YBB.H.T.1A.2
YBM.F.A.1A.3
YBM.F.A.1B.3
YBM.F.A.1B.4
YBM.F.A.2.8
YBM.F.M.0.5
YBM.H.A.0.1
YBM.H.A.1A.1
YBM.H.A.1A.3
YBM.H.A.1A.4
YBM.H.A.1A.6
YBM.H.A.2.1
YBM.H.A.2.3
YBM.H.A.2.6
YBM.H.A.2.7
YBM.H.A.2.8
YBM.H.M.0.1
YBM.H.M.1A.1
YBM.H.M.1A.2
YBM.H.M.1A.3
YBM.H.M.2.1
YBM.H.M.2.2
YBM.H.M.2.3
YBM.H.M.2.4
YBM.H.T.2.1
YBM.H.T.2.3
YBM.H.T.2.5
Number of
scaffolds
189
146
136
236
267
147
167
209
185
146
160
142
156
163
139
162
349
182
159
157
186
167
222
167
211
136
136
199
128
185
124
144
171
238
164
157
172
205
123
231
294
171
204
374
337
176
175
155
257
263
154
142
260
159
148
142
Number of
contigs
200
169
178
245
284
210
196
216
196
172
203
171
184
188
174
198
381
191
185
182
236
198
228
217
223
177
181
211
162
194
151
168
189
263
180
185
187
214
174
237
320
206
232
446
381
191
184
181
270
317
201
198
266
194
173
186
N50 (bp)
41060
53551
50618
31635
30147
46474
43698
35291
41284
46460
41652
44169
50446
43793
51028
42908
20277
40758
44614
49010
42679
42927
30384
47520
32240
53229
54280
35791
56639
37461
77131
49810
47035
30152
44189
44832
39894
36695
67853
35968
25450
42899
36997
42470
22132
42223
44782
51373
27646
31243
44369
47314
26897
46703
44866
49449
Maximum
scaffold
length (bp)
125206
300086
276626
94217
120922
162679
132890
123524
130396
159024
212372
173088
216583
114908
391037
169651
67388
96087
255431
305217
108713
117048
108909
305568
177850
229399
183100
110868
211658
108767
274406
179941
166253
107661
151275
172191
260934
111392
433088
76848
116492
125154
135580
130098
68948
251341
130261
176586
92171
99806
216779
166334
92541
130619
176434
162387
Total length
(bp)
4077722
4198417
4091453
4061783
4102127
4072655
4077893
4075980
4078713
4067732
4042355
4028272
4164309
4157382
4159899
4109510
4010396
4095496
3981512
4123502
4093088
4013698
4005819
4120526
4129439
4125300
4121851
4003367
4074613
4091099
4048935
4033711
4053184
3971272
4072340
3967638
4044602
4084784
3988142
4079528
3997775
4091047
4006298
4232359
4076602
4063805
4085193
4080746
4081406
4187444
4081675
3977612
4079098
4076911
3971657
4085119
Number of
CDS
3942
4068
3943
3919
4053
3912
3945
3925
3932
3929
3899
3870
4022
4020
4028
3998
3954
3972
3859
3990
3968
3890
3891
3970
3991
3973
3978
3864
3965
3965
3892
3868
3902
3858
3934
3817
3898
3935
3852
3964
3858
3953
3887
4006
3934
3972
3957
3935
3944
4054
3934
3842
3957
3939
3839
3952
Coverage
44.1
42.9
44.0
44.3
43.8
44.2
44.1
44.2
44.1
44.2
44.5
44.7
43.2
43.3
43.3
43.8
44.9
43.9
45.2
43.6
44.0
44.8
44.9
43.7
43.6
43.6
43.7
45.0
44.2
44.0
44.5
44.6
44.4
45.3
44.2
45.4
44.5
44.1
45.1
44.1
45.0
44.0
44.9
42.3
44.1
44.3
44.1
44.1
44.1
43.0
44.1
45.2
44.1
44.1
45.3
44.1
GenBank
Accession
JJOR00000000
JJOS00000000
JJOT00000000
JJOU00000000
JJPA00000000
JJPB00000000
JJPC00000000
JJPD00000000
JJPE00000000
JJPF00000000
JJPG00000000
JJPH00000000
JJPI00000000
JJPJ00000000
JJPK00000000
JJPL00000000
JJPM00000000
JJPN00000000
JJPO00000000
JJPP00000000
JJPQ00000000
JJPR00000000
JJPS00000000
JJPT00000000
JJPU00000000
JJPV00000000
JJPW00000000
JJPX00000000
JJPY00000000
JJPZ00000000
JJQA00000000
JJQB00000000
JJQC00000000
JJQD00000000
JJQE00000000
JJQF00000000
JJQG00000000
JJQH00000000
JJQI00000000
JJQJ00000000
JJQK00000000
JJQM00000000
JJQN00000000
JJQO00000000
JJQP00000000
JJQQ00000000
JJQR00000000
JJQS00000000
JJQT00000000
JJQU00000000
JJQV00000000
JJQW00000000
JJQX00000000
JJQZ00000000
JJRA00000000
JJRB00000000
Supplemental Table 1. Genome assembly contiguity statistics for all 56 M. mazei
cultures. The naming scheme for isolates is described in the Figure 1 legend.
9
10
Supplemental Table 2. Genome assembly contiguity statistics for all 7 M.
lacustris-like cultures. The naming scheme for isolates is described in the Figure 1
legend.
11
Supplemental Table 3. Genome assembly breakpoint statistics for draft genome
assemblies of M. barkeri Fusaro, M. mazei C16, M. mazei LYC, and M. mazei
WWM610. The draft genome assemblies of Illumina reads (using the same
assembly pipeline as the isolate genome assemblies) were mapped onto the
complete versions of each genome in order to identify gaps (i.e., assembly
breakpoints).
12
3
2
1
1
3
2
0
2
3
4
0
4
3
0
3
3
4
0
4
2
0
2
0
1
2
1
1
1
0
0
4
1
0
1
1
0
0
0
2
2
1
7
0
0
2
6
2
0
0
1
2
1
2
3
2
1
Number of inferred transfers
mazei-T è mazei-WC
mazei-WC è mazei-T
Median bootstrap
60
69
67
97
54.5
91
89.5
59
55.5
82
51
63.5
78
54.5
74
51
53
67.5
61
78.5
74.5
58.5
77
51.5
63.5
51.5
99
89.5
Annotation
Arylsulfatase regulatory protein
Hypothetical protein
Hypothetical protein
Hypothetical protein
Hypothetical protein
Cell division protein FtsZ (EC 3.4.24.-)
Dihydroxy-acid dehydratase (EC 4.2.1.9)
Hypothetical protein
Glutaredoxin family protein
Archaea-specific Superfamily II helicase
Oligopeptide transporter, ATP-binding protein
Hypothetical protein
Hypothetical protein
Hypothetical protein
Hypothetical protein
Hypothetical protein
ATP-dependent helicase
Hypothetical protein
Hypothetical protein
Sulfite reductase-related protein
NAD-specific glutamate dehydrogenase (EC 1.4.1.2)
Sensory transduction protein kinase (EC:2.7.3.- )
Cell surface protein
Hypothetical protein
N-acetyltransferase
Hypothetical protein
Ubiquitin-like small archaeal modifier protein SAMP2
Hypothetical protein
Start (bp)
348320
540757
570241
594949
1270167
1293312
1427813
1433189
1786464
1846873
2206533
2357453
2438184
2514144
2713715
2737493
2888132
2904681
3191345
3368826
3377762
3407278
3840103
3858558
4028868
4081260
4148750
NA
M. mazei C16
End (bp)
Locus tag
349183
MSMAC_0268
539993
MSMAC_0435
569009
MSMAC_0448
593591
MSMAC_0481
1269424
MSMAC_1048
1293127
MSMAC_1065
1426911
MSMAC_1173
1432308
MSMAC_1179
1786654
MSMAC_1444
1848234
MSMAC_1506
2205724
MSMAC_1802
2357938
MSMAC_1928
2438621
MSMAC_1980
2512081
MSMAC_2043
2713353
MSMAC_2220
2737783
MSMAC_2238
2887563
MSMAC_2367
2901730
MSMAC_2380
3190836
MSMAC_2594
3368197
MSMAC_2736
3375858
MSMAC_2742
3408417
MSMAC_2763
3837464
MSMAC_3099
3857809
MSMAC_3117
4030307
MSMAC_3267
4080874
MSMAC_3314
4149820
MSMAC_3360
NA
NA
Supplemental Table 4. The number of inter-clade gene transfers inferred by
Mowgli for core genes with median bootstrap values of >50. ‘NA’ indicates that the
gene was not found in M. mazei C16.
13
14
Fst
-0.01
0.02
0.07
0.52
0.50
0.11
0.52
0.96
Start (bp)
818143
1431860
1573694
1709081
2034145
2125926
3822717
3919879
End (bp)
818316
1430548
1574449
1706967
2033957
2127350
3822571
3919652
Supplemental Table 5. The annotations of all core genes with a dN/dS of >1.
dN/dS
2.31
1.12
1.72
1.15
2.00
1.83
1.19
1.54
Sequence
Identity annotation
98.75
hypothetical protein
99.91
phosphoglycerate mutase (EC:5.4.2.1)
99.82
Endonuclease III (EC 4.2.99.18)
99.88
hypothetical protein
97.12
hypothetical protein
99.61
PQQ enzyme repeat domain protein
98.26
hypothetical protein
98.71
hypothetical protein
Locus tag
MSMAC_0670
MSMAC_1177
MSMAC_1292
MSMAC_1399
MSMAC_1666
MSMAC_1737
MSMAC_3086
MSMAC_3170
M. mazei C16
Category
Clade
Site
Media
Substrate
Subtype
Total
Group
mazei-T
mazei-WC
YBB
YBM
Freshwater
Marine
Acetate
Methanol
TMA
I-B
I-C
I-D
I-E
I-G
III-A
III-B
VIII-3
Total
Number of specific*, Number of specific*,
unique spacers
unique spacers (% of total)
75
93
0
92
0
0
0
0
0
3
75
25
34
63
38
75
69
2240
3.5
4.2
0.0
4.1
0.0
0.0
0.0
0.0
0.0
0.1
3.4
1.1
1.5
2.8
1.7
3.4
3.1
100.0
Supplemental Table 6. The number of unique spacers specific to each group in
each category. * ‘specific’ defined as spacers found in the majority of strains in one
clade (mazei-WC or mazei-T) but absent from the other.
15
acetate
Df
Sum Sq
Mean Sq
F value
clade
1
0.000
0.000
0.564
round
3
0.003
0.001
14.338
clade:round 3
0.000
0.000
1.320
Residuals
36
0.002
0.000
methanol
Df
Sum Sq
Mean Sq
F value
clade
1
0.000
0.000
1.423
round
3
0.003
0.001
11.608
clade:round 3
0.002
0.001
6.232
Residuals
27
0.002
0.000
trimethylamine
Df
Sum Sq
Mean Sq
F value
clade
1
0.001
0.001
11.895
round
3
0.001
0.000
4.732
clade:round 3
0.000
0.000
1.617
Residuals
32
0.003
0.000
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Pr(>F)
0.458
0.000
0.283
Pr(>F)
0.243
0.000
0.002
Pr(>F)
0.002
0.008
0.205
***
***
**
**
**
Supplemental Table 7. Nested ANOVA tables assessing significant treatment
effects of clade (mazei-WC and mazei-T) and round (Rounds 1-4) on maximum
methane production rates.
16
Locus tag
(M. mazei
Gö1)
Annotation
MM0011
Hypothetical protein
MM0093
Cobyric acid synthase CbiP
MM0174
Methanol corrinoid protein MtaC3
MM0175
Methanol:corrinoid methyltransferase MtaB3
MM0176
Methylcobalamin:coenzyme M methyltransferase MtaA2
MM0312
Hypothetical protein
MM0408
Hypothetical protein
MM0496
Phosphate acetyltransferase
MM0583
Hypothetical protein
MM0671
2-Isopropylmalate synthase
MM0772
Hypothetical protein
MM0869
Hypothetical protein
MM0870
Beta-ketoacyl synthase/thiolase
MM0871
Hydroxymethylglutaryl-CoA synthase
MM0872
Putative transcriptional regulator
MM0924
Hypothetical protein
MM1025
Thiamine biosynthesis protein ThiC
MM1070
Methylcobalamin:coenzyme M methyltransferase MtaA1
MM1071
Hypothetical protein
MM1073
Methanol corrinoid protein MtaC2
MM1074
Methanol:corrinoid methyltransferase MtaB2
MM1075
putative regulatory gene MtaR
MM1112
Hypothetical protein
MM1271
2-Dehydro-3-desoxyphosphoheptanote aldolase
MM1272
3-Dehydroquinate synthase
MM1273
3-Dehydroquinate dehydratase
MM1274
Shikimate 5-dehydrogenase
MM1275
Prephenate dehydrogenase
MM1284
2-Isopropylmalate synthase
MM1304
Hypothetical protein
MM1321
Formylmethanofuran H4MPT formyltransferase
MM1434
Methylamine permease MtmP
MM1435
Methylamine permease MtmP
MM1436
Monomethylamine:corrinoid methyltransferase MtmB1
MM1438
Monomethylamine corrinoid protein MtmC1
MM1439
Methylcobalamin:coenzyme M methyltransferase MtbA2
MM1488
Hypothetical protein
MM1601
Cobalamin biosynthesis protein CobN
MM1602
Cobalamin biosynthesis protein CobN
MM1612
Hypothetical protein
MM1647
Methanol:corrinoid methyltransferase MtaB1
MM1648
Methanol corrinoid protein MtaC1
MM1687
Dimethylamine corrinoid protein MtbC1
MM1688
Trimethylamine:corrinoid methyltransferase MttB1
MM1690
Trimethylamine corrionid protein MttC1
MM1691
Trimethylamine permease MttP1
MM1693
Dimethylamine:corrinoid methyltransferase MtbB1
MM1761
Hypothetical protein
MM1762
Mevalonate kinase
MM1950
Catalase
MM1951
Hypothetical protein
MM1977
Hypothetical protein
MM1982
Alkyl sulfatase
MM2045
Trimethylamine permease MttP2
MM2046
Trimethylamine permease MttP2
MM2047
Trimethylamine corrionid protein MttC2
MM2049
Trimethylamine:corrinoid methyltransferase MttB2
MM2051
Dimethylamine:corrinoid methyltransferase MtbB2
MM2052
Dimethylamine corrionid protein MtbC2
MM2338
Hypothetical protein
MM2387
Cobalt transport ATP-binding protein CbiO
MM2818
Anthranilate synthase component I
MM2821
Tryptophan synthase, alpha chain
MM2822
Tryptophan synthase subunit beta
MM2843
Hypothetical protein
MM2882
Hypothetical protein
MM2933
Hypothetical protein
MM2961
Dimethylamine corrinoid protein MtbC3
MM2962
Dimethylamine:corrinoid methyltransferase MtbB3
MM2964
Dimethylamine permease MtbP
MM3011
Hypothetical protein
MM3108
Hypothetical protein
MM3197
Hypothetical protein
MM3334
Monomethylamine corrinoid protein MtmC2
MM3335
Monomethylamine:corrinoid methyltransferase MtmB2
* 'NA' if no variation at conserved alignment positions
** based on Krätzer et al., 2009
dN/dS*
0.09
NA
NA
NA
NA
0.08
0.06
NA
0.24
0.10
0.09
NA
0.16
1.15
0.06
0.11
0.11
0.17
0.14
NA
0.05
0.16
NA
NA
NA
NA
NA
0.35
0.10
NA
0.07
NA
NA
0.07
0.21
0.10
0.58
0.11
0.10
0.05
0.12
0.15
0.04
0.08
0.04
NA
0.03
0.20
NA
0.21
NA
NA
NA
0.10
0.07
0.05
0.05
0.03
0.05
0.16
0.10
0.35
NA
NA
0.46
0.13
0.47
0.03
0.03
0.06
0.41
NA
0.08
0.16
0.11
17
FST
0.00
0.02
0.00
0.00
0.00
0.05
0.00
0.00
0.46
0.18
0.00
1.00
0.99
0.01
0.12
0.02
-0.02
0.93
0.00
0.26
0.30
-0.01
0.00
-0.02
0.00
0.00
0.00
0.09
0.18
0.05
-0.03
-0.01
0.37
-0.04
0.28
0.13
0.04
0.25
0.29
0.97
0.80
0.73
0.05
-0.02
-0.03
0.06
0.00
0.12
0.04
0.33
0.02
0.00
0.00
0.17
0.18
0.00
-0.01
0.00
0.72
0.00
0.07
0.22
0.00
0.00
0.04
0.02
0.46
0.01
0.01
-0.01
0.47
0.03
-0.01
-0.12
0.17
Sequence
Identity
Mean copy
number
(T,WC)
Mean gene
length
(T,WC)
Differentially
expressed on
MeOH vs TMA?**
77.24
99.95
100.00
100.00
100.00
77.31
91.69
100.00
99.15
99.79
77.24
99.75
99.48
99.89
98.03
86.64
77.83
97.07
72.29
99.74
98.41
73.82
100.00
80.04
100.00
100.00
100.00
99.91
99.79
99.99
99.82
99.99
99.92
82.74
99.57
99.66
99.19
99.67
99.66
97.73
98.98
99.68
99.41
99.80
99.61
99.94
99.27
99.91
99.94
99.39
99.73
100.00
99.93
99.49
96.72
95.57
98.34
99.33
95.93
75.93
99.98
97.86
100.00
100.00
99.84
99.22
99.91
99.62
99.33
99.88
98.68
99.98
99.93
99.07
64.03
7.9, 8.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
2.0, 2.0
2.0, 1.5
1.0, 1.0
1.0, 1.0
1.0:1.0
7.9:8.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
2.0, 2.0
2.0, 2.0
1.0, 1.0
3.0, 3.0
1.0, 1.0
1.0, 1.0
3.0, 3.0
1.0, 1.0
2.0, 2.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
2.0, 2.0
1.0, 1.0
1.0, 1.0
2.0, 2.0
1.0, 1.0
1.0, 1.0
1.7, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
2.0, 2.0
2.0, 2.0
1.0, 1.0
8.7, 9.2
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
1.0, 1.0
389, 388
1472, 1477
765, 765
1386, 1386
1017, 1017
709, 710
451, 447
1002, 1002
795, 795
1537, 1548
389, 388
399, 399
1173, 1173
1039, 1050
741, 734
178, 187
1289, 1299
1017, 1017
1624, 1624
773, 774
1383, 1383
593, 593
273, 273
792, 799
1143, 1143
729, 729
843, 843
1419, 1419
1537, 1548
836, 846
894, 894
270, 270
1407, 1407
1362, 1374
467, 611
1020, 1020
1030, 976
4328, 4320
4540, 4554
1791, 1780
1383, 1383
774, 774
454, 442
1242, 1242
648, 648
1047, 1050
291, 291
798, 798
917, 925
1229, 2133
154, 144
579, 579
1727, 1720
1449, 1449
1027, 1044
654, 654
1242, 1242
1192, 1217
648, 645
204, 203
1497, 1497
1874, 1870
816, 816
1212, 1212
1266, 1287
1098, 1091
1590, 1590
642, 642
1192, 1217
1446, 1446
816, 816
1005, 1003
411, 411
534, 598
1374, 1374
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Supplemental Table 8. Genetic differentiation between mazei-T and mazei-WC of
genes potentially associated with growth of Methanosarcina spp. on TMA. The table
includes all methyltransferase 1 (MT1), methyltransferase 2 (MT2), corrinoid
proteins, and putative regulatory proteins involved in methylotrophic growth of
Methanosarcina. In addition, the table includes all genes shown by Krätzer and
colleagues to be differentially expressed in Methanosarcina mazei Gö1 when grown
on methanol versus TMA (Krätzer et al., 2009). Only genes present in >1 M. mazei
isolate from both YBM and YBB are shown. Bold values highlight genes with a dN/dS
> 1.1, FST > 0.7, or substantially differing mean copy numbers or gene lengths
between mazei-T and mazei-WC. Gene length is in base pairs (bp).
18
Strain
Methanobacterium sp AL21 1
Methanobrevibacter smithii ATCC
Methanocaldococcus jannaschii DSM 2661
Methanocella arvoryzae MRE50
Methanocella conradii HZ254
Methanococcoides burtonii DSM 6242
Methanococcoides methylutens MM1
Methanococcus maripaludis C5
Methanoculleus marisnigri JR1
Methanohalobium evestigatum Z7303
Methanomethylovorans hollandica WWM590
Methanopyrus kandleri AV19
Methanoregula boonei 6A8
Methanosaeta concilii GP6
Methanosarcina acetivorans C2A
Methanosarcina baltica type strain
Methanosarcina barkeri str fusaro
Methanosarcina calensis Cali
Methanosarcina horonobensis HB1
Methanosarcina lacustris Z7289
Methanosarcina mazei C16
Methanosarcina mazei Go1
Methanosarcina mazei TMA
Methanosarcina mazei WWM610
Methanosarcina siciliae C2J
Methanosarcina sp Kolksee
Methanosarcina sp MTP4
Methanosarcina thermophila TM1 DSM1825
Methanosphaera stadtmanae DSM 3091
Methanosphaerula palustris E19c
Methanospirillum hungatei JF1
Methanothermobacter marburgensis str Marburg
Methanothermus fervidus DSM
Accession Number
CP002551
CP000678
L77117
AM114193
CP003243
CP000300
*
CP000609
CP000562
CP002069
*
AE009439
CP000780
AB679170
AE010299
*
CP000099
*
*
*
*
AE008384
*
*
*
*
*
*
CP000102
CP001338
CP000254
CP001710
CP002278
* Awaiting acceptance from GenBank
Supplemental Table 9. GenBank accession numbers for all reference strains used
in Supplemental Figure 1.
19
Download