1 Genomic and phenotypic differentiation among Methanosarcina mazei 2 populations from Columbia River sediment 3 4 Nicholas D. Youngblut1,2, Joseph S. Wirth1,3, James R. Henriksen1,4 , Maria Smith5, 5 Holly Simon5, William W. Metcalf1, Rachel J. Whitaker1,* 6 1. 7 8 Avenue, Urbana, IL 61801, USA 2. 9 10 Department of Microbiology, University of Illinois at Urbana-Champaign, 601 South Goodwin Currently at: Department of Crop and Soil Sciences, Cornell University, 306 Tower Road, Ithaca, NY 14853, USA 3. 11 Currently at: Department of Microbiology, University of Georgia, 120 Cedar Street, Athens, GA 30602, USA 12 4. Currently at: AgBiome, PO Box 14069, Research Triangle Park, NC 27709, USA 13 5. Institute of Environmental Health; Division of Environmental and Biomolecular Systems, 14 Oregon Health and Science University, 3251 S.W. Jackson Park Rd, Portland, OR 97239, USA 15 16 17 Supplemental Materials and Methods 18 Description of sites and sediments 19 Adjacent to the main channel of the Columbia River Estuary, Baker Bay and 20 Youngs Bay are believed to be important sites for nutrient transformations by 21 sediment microbial communities. The sediment microbiota is thought to contribute 22 to net ecosystem metabolism, in part, by producing metabolites that feed into the 23 ‘microbial loop’ in the mainstem estuary. Alternatively, these metabolites may also 24 be transported to coastal waters in the river plume (Gilbert et al., 2013). Sediment 25 microbial communities develop and evolve in the context of continuous material 1 26 exchanges with the water column, including particles, organic matter, reduced 27 substrates, electron acceptors, and respiratory gases (Cai et al., 1999; Turner and 28 Millward, 2002). Water and sediments in the Columbia River Estuary are routinely 29 exposed to dynamic shifts in end-member forcing with changes in season and the 30 tidal cycle. A number of resources are therefore used to analyze the properties of 31 the estuarine water column and/or sediments, including: i) historical data from 32 previous Columbia River monitoring projects (Simenstad et al., 1990; Sherwood et 33 al., 1990; Simenstad et al., 1984; Smith et al., 2010); ii) SATURN endurance stations 34 (Gilbert et al., 2013); and iii) measurements obtained during several shore sampling 35 campaigns in 2012-2014 (Smith et al., in prep, Lydie Herfort, personal 36 communication). 37 The lower Columbia River Estuary adjacent to the lateral bays displays the 38 full range of water salinities between 0 and 32 PSU (Smith et al., 2010). Youngs Bay 39 is located at the confluence of three rivers, with salinities at the mouth highly 40 dependent on river flow (which varies seasonally). Salinities observed in Youngs 41 Bay at the time of sampling (late summer) range from 0 to > 20 PSU (reviewed in 42 Simenstad et al., 1984). Our near-shore measurements of salinity at the YBM site 43 were between 2 and 6, and from 0 to 5 PSU at the YBB site (Smith et al., in prep; 44 Lydie Herfort, personal communication). Although the YBM site was about 4 km 45 closer than YBB to the mouth of Youngs Bay, sediments from the two sites were 46 similar in characteristics with respect to class (silt loam), total nutrients (NH4+, 47 ~160 ppm; NO3-, ~ 3 ppm) and major metal content (Fe and Mn were each in the 48 200 ppm range). Differences were observed, however, in pH (6.6 vs. 6.3 for YBM and 2 49 YBB, respectively), and total phosphorous (6 ppm for YBM vs. 15 ppm for YBB). 50 Organic matter content was also somewhat higher at the YBB site, at 8.8%, while 51 YBM sediments contained 5.4%. 52 Based on historical data, Baker Bay is described as highly influenced by tidal 53 forcing, with salinities at mid-depth during low river flow in the summer/fall 54 seasons ranging from 6.4 to more than 32 PSU (Simenstad et al., 1984). Our near- 55 shore measurements of salinity at the sampling site produced values ranging from 4 56 to 16 PSU (Smith et al., in prep; Lydie Herfort, personal communication). The pH was 57 higher (7.5) in the sandy loam sediments from the Baker Bay site compared to the 58 Youngs Bay sites. Ammonium was about 10-times lower (15.5 ppm), and total Fe 59 was about half (130 ppm) that measured at the Youngs Bay sites. Organic matter 60 was also lowest (1.8%) at the Baker Bay site, while total phosphorous (17 ppm) was 61 similar to that in YBM sediments. 62 63 64 Sample collection On August 22, 2011 during low tide, sediment samples were collected near 65 the shore at three locations in Youngs Bay and Baker Bay in the Columbia River 66 Estuary as described in (Smith et al. in prep). Samples were collected in sterile 50 ml 67 Corning tubes and stored on ice until processed. 68 69 70 71 Nucleic acid extraction and mcrA amplicon 454-pyrosequencing DNA was extracted from approximately 1 g from each sediment sample with the PowerSoil DNA Isolate Kit using the standard protocol (MoBio, Carlsbad, CA). 3 72 Adapters and barcodes were added to the methyl coenzyme M reductase subunit A 73 (mcrA) specific primers mcrF and mcrR (Luton et al., 2002) for multiplexed 454 74 pyrosequencing. The gene fragment was amplified by PCR with a final volume of 30 75 μl containing the final concentrations of 0.2 mM dNTPs (each), 0.5 μM primers 76 (each), and 0.03 U of Phusion DNA Polymerase F-530 (Finnzymes, MA, USA). 77 Thermocycler conditions consisted of an initial denaturation for 3 minutes at 98°C, 78 followed by 30 cycles of 30 seconds at 98°C, 15 seconds at 59°C, and 15 seconds at 79 72°C, with a final extension of 10 minutes at 72°C. Triplicate PCR reactions were 80 pooled, and gel bands of the expected amplicon size were excised and purified with 81 the Wizard DNA Purification Kit (Promega, Madison, WI). Purified amplicons were 82 submitted to the WM Keck Center for Comparative and Functional Genomics at the 83 University of Illinois at Urbana-Champaign for 454 pyrosequencing on a 454 84 GSFLX+ Sequencer (Roche, Branford, CT). 85 86 87 mcrA sequence analysis Mothur v1.24.0 (Schloss et al., 2009) was used for 454 pyrosequencing read 88 barcode and primer removal along with sequence quality filtering. Sequences that 89 were <200 bp in length, contained homopolymers >6 bp long, had >1 error in the 90 barcode, or >1 error in the primer were discarded. Chimeric sequences were 91 identified and removed with the Mothur implementation of Uchime (Edgar et al., 92 2011). Combined, quality filtering removed 12.5% (8898 of 71097) of the 93 sequences. Sequences were clustered into 295 operational taxonomic units (OTUs) 94 at a 95% sequence identity cutoff with CD-HIT-454 v4.6 (Fu et al., 2012). A 4 95 reference mcrA dataset was constructed from select mcrA sequence fragments of 96 cultured methanogens in the Functional Gene pipeline and repository (FunGene; 97 http://fungene.cme.msu.edu), mcrA gene sequences from each sequenced isolate in 98 this study, and all sequenced Methanosarcina genomes. Amino acid sequences of the 99 reference mcrA dataset were aligned with mafft v7.037b (Katoh and Standley, 2013) 100 and then reverse-translated with PAL2NAL v14 (Suyama et al., 2006). A maximum 101 likelihood phylogeny was inferred from nucleotide alignment with RAxML v7.2.6 102 (GTR-Γ model; 100 bootstrap replicates) (Stamatakis, 2006). Representative 103 sequences for each environmental mcrA OTU were inserted into the reference 104 phylogeny using RAxML. The number of sequences within each OTU at their sample 105 origin was mapped onto the tree with iTOL v2 (Letunic and Bork, 2011). 106 107 Culture isolation 108 Direct plating with agar overlays under strictly anaerobic conditions was 109 used for initial Methanosarcina strain cultivation. Three sediment dilutions (100, 10- 110 1, 111 or freshwater PIPES-buffered media consisting of 1 μM KPO4, 10 μM NH4Cl, 4 μM 112 resazurin, 40 mM PIPES buffer, 1:100 trace elements solution, 1:100 vitamin 113 solution, and 1X base salts. The trace element solution consisted of 5.8 mM 114 N(CH2CO2H)3, 2 mM Fe(NH4)2(SO4)2, 1.1 mM Na2SeO3, 0.4 mM CoCl26H2O, 0.6 mM 115 MnSO4H2O, 0.4 mM Na2MoO42H2O, 0.3 mM Na2WO42H2O, 0.3 mM ZnSO47H2O, 116 0.4 mM NiCl26H2O, 0.16 mM H3BO3, 40 μM CuSO45H2O. The vitamin solution 117 consisted of 73 μM p-aminobenzoic acid, 81 μM nicotinic acid, 42 μM calcium and 10-2) were plated on bicarbonate-buffered marine media (Metcalf et al., 1996) 5 118 pantothenate, 49 μM pyridoxine HCl, 27 μM riboflavin, 30 μM thiamine HCl, 20 μM 119 biotin, 11 μM folic acid, 24 μM α-lipoic acid, and 3.7 μM vitamin B12. The base salts 120 consisted of 342 mM NaCl, 14.8 mM MgCl26H2O, 1 mM CaCl22H2O, and 6.71 mM 121 KCl. Isolated colonies picked from the direct plating were subjected to 1-3 rounds of 122 restreaking. 123 124 125 Genomic sequencing and assembly Genomic DNA extracted from each culture using the UltraClean Microbial 126 DNA Isolation Kit (MoBio, Carlsbad, CA). Multiplexed libraries were prepared using 127 the Nextera XT DNA Sample Prep Kit (lllumina, San Diego, CA) without performing 128 the bead normalization step. Instead, the libraries were quantified with a Qubit 129 fluorometer (Life Technologies, Carlsbad, CA) and normalized by dilution with 130 molecular grade water. Normalized libraries were pooled and submitted to the WM 131 Keck Center for Comparative and Functional Genomics at the University of Illinois at 132 Urbana-Champaign for paired-end sequencing with a HiSeq2000 sequencer 133 (Illumina, San Diego, CA). 134 Our genome assembly pipeline was optimized through extensive testing by 135 comparing draft assemblies produced from just Illumina HiSeq 2000 paired-end 136 reads from the reference stains M. barkeri Fusaro (Maeder et al., 2006), M. mazei 137 WWM610, M. mazei C16 (Blotevogel et al., 1986), and M. mazei LYC (Liu et al., 1985) 138 versus the closed versions of these genomes, which had been assembled with 139 multiple sequencing methods including paired-end 454 pyrosequencing data, 140 cosmid paired-end reads, and Sanger sequencing to fill gaps. By this benchmarking, 6 141 we selected parameters that foremost increased assembly accuracy and secondarily 142 increased assembly contiguity. 143 We also used this benchmark dataset to assess likely causes of the draft 144 assembly breakpoints. To this end, we mapped each draft genome to the closed 145 version with ABACAS v1.3.1 (Assefa et al., 2009) to identify alignment gaps (i.e., 146 assembly breakpoints). We identified genomic elements that were located at the gap 147 edges (≤100 bp from a gap start or end) and may have caused the assembly 148 breakpoint. 149 The paired-end reads were quality-filtered with the FASTX Toolkit v0.0.13 150 using a q-value cutoff of 30 over 95% of the read length. Filtered reads were 151 randomly subsampled to one million read pairs per sample (~40-50X coverage), 152 which we found to provide optimal assembly accuracy and contiguity based on our 153 benchmark dataset. Genomic assembly and scaffolding was performed with a 154 modified version of the A5 assembly pipeline (Tritt et al., 2012), in which IDBA-UD 155 v1.1.0 (Peng et al., 2012) was used instead of IDBA for the actual assembly. This 156 assembly method consistently increased assembly contiguity. BLASTn was used to 157 identify and remove scaffolds potentially containing contamination (e.g., regions 158 with a high number of hits to E. coli; E-value <1e-20). The percentage of total 159 scaffold length of any assembly that was identified as contamination and removed 160 varied from 0-3%. Gaps in scaffolds were filled in silico with GapFiller (Boetzer and 161 Pirovano, 2012), with an average of 68% of gaps closed per assembly. Sequel v1.0.1 162 (Ronen et al., 2012) was used to correct on average 39 base miscalls and/or 163 erroneous indels in each assembly. OASIS (Robinson et al., 2012) was used to 7 164 identify putative insertion sequences, and IslandViewer (Langille and Brinkman, 165 2009) was used to identify putative genomic islands based on multiple sequence 166 composition criteria. CRISPRs were identified with CRISPRFinder (Grissa et al., 167 2007) and classified in accordance with Vestergaard and colleagues by assessing 168 homology (BLASTp, E-value < 1e-20) to all annotated cas genes in the study’s 169 dataset (Vestergaard et al., 2014). cas genes were manually annotated by searching 170 the NCBI non-redundant protein database via BLASTp and searching the protein 171 family databases CDD (Marchler-Bauer et al., 2005), Pfam (Finn et al., 2008), and 172 COG (Tatusov et al., 2000) with HHsearch (Söding, 2005). CRISPR spacer content 173 conservation was assessed by pairwise alignments of all CRISPRs classified as the 174 same subtype. For the alignments, each CRISPR was represented as a string; with 175 each unique spacer nucleotide sequence represented a unique character in the 176 string. The CRISPR strings were oriented by the putative leader regions of each 177 CRISPR and aligned pairwise with a Levenshtein distance algorithm implemented in 178 Perl. Matched spacers in the alignment (i.e., the same character in the CRISPR string 179 representation) received a score of 1, while mismatches were scored as 0. Leader 180 regions were identified by sequence conversation of the leader region and direct 181 repeat sequence conservation. 182 183 184 Whole genome alignemnt A whole genome alignment (WGA) of isolate genomes identified as M. mazei 185 and all reference M. mazei genomes was created with mugsy v1.2.3 (Angiuoli and 186 Salzberg, 2011). RAxML was used to infer a ‘species’ tree (GTR-Γ model; 100 8 187 bootstrap replicates) from all ‘core’ (found in all taxa) local collinear blocks (LCBs) 188 in the WGA. The M. mazei reference strain genomes were aligned with 189 progressiveMauve v2.3.1 (Darling et al., 2010). Mauve v2.3.1 (Darling et al., 2004) 190 was used to visualize the alignment and calculated double-cut-and-join (DCJ) 191 distances. 192 193 Core and variable gene analysis 194 Genes were called and annotated using the Rapid Annotation using 195 Subsystem Technology (RAST) server (Aziz et al., 2008). The ITEP toolkit (Benedict 196 et al., 2014) was used to group genes from all isolates (or isolates and type strains) 197 into putative homologs through Markov Chain Clustering (via the MCL program) of 198 BLASTp maximum bitscore ratios (0.4 cutoff, 2.0 inflation parameter) (Enright et al., 199 2002). 200 Amino acid sequences of gene clusters were aligned with mafft v7.037b 201 (Katoh and Standley, 2013) and the reverse-translated with PAL2NAL v14 (Suyama 202 et al., 2006). Poor alignments caused by artificial gene truncations due to 203 incomplete genome assembly were identified and removed using a custom Perl 204 script that identified sequences in alignments where half of the aligned sequence 205 was an outlier (±1 standard deviation) in terms of mean sequence identity and 206 number of gaps and was also within 300 bp of a contig end. Maximum likelihood 207 phylogenies were inferred from the nucleotide alignments with RAxML v7.2.6 (GTR- 208 Γ model; 100 bootstrap replicates) (Stamatakis, 2006). 9 209 To estimate the number of gene clusters missing from any particular draft 210 genome, we assessed the number of gene clusters missing from the draft assemblies 211 compared to the complete assemblies of our genomes (M. barkeri Fusaro, M. mazei 212 WWM610, M. mazei C16, M. mazei LYC; see above). We found that <2% of coding 213 sequences were missing when comparing the draft assemblies to their 214 corresponding closed genomes, indicating a limited impact of artificial gene absence 215 on our analysis. Still, to account for this possibility and to mitigate errors caused by 216 artificial gene loss, we defined genes specific to the mazei-WC or mazei-T clade as 217 those found in the majority of strains in one clade but absent from the other. 218 Quantification of dN/dS, mean sequence identity, and FST values for the core 219 genes was performed with SNAP, Mothur v1.24.0, and Arlecore v3.5.1.3, respectively 220 (Korber, 2000; Schloss et al., 2009; Excoffier and Lischer, 2010). 221 We assessed inter-clade recombination of core genes using two general 222 methods: tree-reconciliation and quartet decomposition. The former was performed 223 with Mowgli (Nguyen et al., 2013), which infers recombination through identifying 224 incongruences between the ‘species’ tree (the WGA phylogeny) and a gene tree 225 (inferred for each core gene), while accounting for unsupported nodes by 226 performing nearest neighbor interchange (NNI) operations to minimize false 227 inferences of gene transfer and duplication. We found that poorly supported nodes 228 in either the species or gene trees, as often occur among highly similar sequences, 229 greatly inflated the number of inferred gene transfers for both methods (data not 230 shown). To mitigate this artifact, we only used gene trees with a bootstrap support 231 of >50 and a clone-corrected species tree, where all approximately clonal taxa were 10 232 collapsed to one representative taxon. However, most gene trees did not meet our 233 criterion. Therefore, we also employed quartet decomposition with the Quartet 234 Decomposition Server (Mao et al., 2012) to identify individual quartets in gene trees 235 that had high bootstrap support but were incongruent with the bifurcation of mazei- 236 T and mazei-WC. For this analysis, we only assessed gene trees possessing quartets 237 with two members from both mazei-T and mazei-WC and ≥1 SNP segregating the 238 internal nodes of the quartet. 239 240 241 Statistics and plotting All statistical evaluations were performed in R (R Development Core Team, 242 2010). The circular genome plots were created with Circos (Krzywinski et al., 2009), 243 and all other plots were produced with R using the ggplot2 package (Wickham, 244 2009). All phylogenies were visualized with either iTOL v2 or FigTree v1.4.0 245 (Letunic and Bork, 2011). All custom Perl scripts used in this study are available at 246 https://github.com/nyoungb2/pop_genome, with many relying on the Bioperl 247 Toolkit (Stajich et al., 2002). 248 249 250 Methane production assays Methane production, as proxy for culture growth, was monitored using a 251 Hewlett Packard 5890 Series II gas chromatograph (Hewlett-Packard, Wilmington, 252 DE) with a flame ionization detector and a column of stainless steel filled with 253 80/120 Carbopack B/3% SP-1500 (Supelco, Bellefonte, PA) heated to 225°C. The 254 maximum growth rates inferred from the resulting methane production curves 11 255 were compared among isolates. Depending on the culture and substrate, stationary 256 phase was reached between ~500 and ~1100 hours. The low-throughput nature of 257 this method limited the number of cultures that can be compared in the same 258 experiment. In addition, cultures were originally isolated on different media 259 (freshwater and marine) and substrates (trimethylamine, methanol, and acetate), 260 which could be confounding factors. To control for this, we performed direct 261 pairwise comparisons between isolates from different clades isolated on the same 262 media and substrate when possible. Two to four isolates were compared in any 263 given round of methane production monitoring. Each isolate was grown in its 264 ‘native’ medium (marine or freshwater) in triplicate or quadruplicate. In addition, 265 we found no growth in cultures inoculated in media lacking substrate and balch 266 tubes containing all substrates but lacking inoculum. 12 267 Supplemental References 268 269 Angiuoli SV, Salzberg SL. (2011). Mugsy: fast multiple alignment of closely related whole genomes. Bioinformatics 27:334–342. 270 271 272 Assefa S, Keane TM, Otto TD, Newbold C, Berriman M. (2009). ABACAS: algorithmbased automatic contiguation of assembled sequences. Bioinforma Oxf Engl 25:1968–1969. 273 274 Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. (2008). The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics 9:75. 275 276 Benedict MN, Henriksen JR, Metcalf WW, Whitaker RJ, Price ND. (2014). ITEP: An integrated toolkit for exploration of microbial pan-genomes. BMC Genomics 15:8. 277 278 Blotevogel KH, Fischer U, Lüpkes KH. (1986). Methanococcus frisius sp.nov., a new methylotrophic marine methanogen. Can J Microbiol 32:127–131. 279 280 Boetzer M, Pirovano W. (2012). Toward almost closed genomes with GapFiller. Genome Biol 13:R56. 281 282 283 Cai W-J, Pomeroy LR, Moran MA, Wang Y. (1999). Oxygen and carbon dioxide mass balance for the estuarine-intertidal marsh complex of five rivers in the southeastern U.S. Limnol Oceanogr 44:639–649. 284 285 Darling ACE, Mau B, Blattner FR, Perna NT. (2004). Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 14:1394–1403. 286 287 Darling AE, Mau B, Perna NT. (2010). progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement. PLoS ONE 5:e11147. 288 289 Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R. (2011). UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 27:2194–2200. 290 291 Enright AJ, Van Dongen S, Ouzounis CA. (2002). An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30:1575–1584. 292 293 294 Excoffier L, Lischer HEL. (2010). Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol Ecol Resour 10:564–567. 295 296 Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz H-R, et al. (2008). The Pfam protein families database. Nucleic Acids Res 36:D281–288. 297 298 Fu L, Niu B, Zhu Z, Wu S, Li W. (2012). CD-HIT: accelerated for clustering the nextgeneration sequencing data. Bioinforma Oxf Engl 28:3150–3152. 13 299 300 301 Gilbert M, Needoba J, Koch C, Barnard A, Baptista A. (2013). Nutrient Loading and Transformations in the Columbia River Estuary Determined by High-Resolution In Situ Sensors. Estuaries Coasts 36:708–727. 302 303 304 Grissa I, Vergnaud G, Pourcel C. (2007). CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic Acids Res 35:W52–W57. 305 306 Katoh K, Standley DM. (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780. 307 308 Korber B. (2000). HIV signature and sequence variation analysis. Comput Anal HIV Mol Seq 4:55–72. 309 310 311 Krätzer C, Carini P, Hovey R, Deppenmeier U. (2009). Transcriptional Profiling of Methyltransferase Genes during Growth of Methanosarcina mazei on Trimethylamine. J Bacteriol 191:5108–5115. 312 313 314 Krzywinski M, Schein J, Birol İ, Connors J, Gascoyne R, Horsman D, et al. (2009). Circos: an information aesthetic for comparative genomics. Genome Res 19:1639– 1645. 315 316 317 Langille MGI, Brinkman FSL. (2009). IslandViewer: an integrated interface for computational identification and visualization of genomic islands. Bioinformatics 25:664–665. 318 319 Letunic I, Bork P. (2011). Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic Acids Res 39:W475–W478. 320 321 322 Liu Y, Boone DR, Sleat R, Mah RA. (1985). Methanosarcina mazei LYC, a New Methanogenic Isolate Which Produces a Disaggregating Enzyme. Appl Environ Microbiol 49:608–613. 323 324 325 Luton PE, Wayne JM, Sharp RJ, Riley PW. (2002). The mcrA gene as an alternative to 16S rRNA in the phylogenetic analysis of methanogen populations in landfill. Microbiology 148:3521–3530. 326 327 328 329 Maeder DL, Anderson I, Brettin TS, Bruce DC, Gilna P, Han CS, et al. (2006). The Methanosarcina barkeri Genome: Comparative Analysis with Methanosarcina acetivorans and Methanosarcina mazei Reveals Extensive Rearrangement within Methanosarcinal Genomes. J Bacteriol 188:7922–7931. 330 331 332 Mao F, Williams D, Zhaxybayeva O, Poptsova M, Lapierre P, Gogarten JP, et al. (2012). Quartet decomposition server: a platform for analyzing phylogenetic trees. BMC Bioinformatics 13:123. 14 333 334 335 Marchler-Bauer A, Anderson JB, Cherukuri PF, DeWeese-Scott C, Geer LY, Gwadz M, et al. (2005). CDD: a Conserved Domain Database for protein classification. Nucleic Acids Res 33:D192–D196. 336 337 338 Metcalf WW, Zhang JK, Shi X, Wolfe RS. (1996). Molecular, genetic, and biochemical characterization of the serC gene of Methanosarcina barkeri Fusaro. J Bacteriol 178:5797–5802. 339 340 341 Nguyen TH, Ranwez V, Pointet S, Chifolleau A-MA, Doyon J-P, Berry V. (2013). Reconciliation and local gene tree rearrangement can be of mutual profit. Algorithms Mol Biol 8:12. 342 343 344 Peng Y, Leung HC, Yiu S-M, Chin FY. (2012). IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:1420–1428. 345 346 R Development Core Team. (2010). R: A Language and Environment for Statistical Computing. Vienna, Austria http://www.R-project.org. 347 348 349 Robinson DG, Lee M-C, Marx CJ. (2012). OASIS: an automated program for global investigation of bacterial and archaeal insertion sequences. Nucleic Acids Res 40:e174–e174. 350 351 Ronen R, Boucher C, Chitsaz H, Pevzner P. (2012). SEQuel: improving the accuracy of genome assemblies. Bioinformatics 28:i188–i196. 352 353 354 355 Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. (2009). Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities. Appl Environ Microbiol 75:7537–7541. 356 357 Sherwood CR, Jay DA, Bradford Harvey R, Hamilton P, Simenstad CA. (1990). Historical changes in the Columbia River Estuary. Prog Oceanogr 25:299–352. 358 359 360 Simenstad CA, Jay DA, McIntire D, Nehlsen W, Sherwood C. (1984). The dynamics of the Columbia River estuarine ecosystem. Columbia River Estuary Data Development Program. Portland, Oregon. 361 362 363 Simenstad CA, Small LF, David McIntire C, Jay DA, Sherwood C. (1990). Columbia river estuary studies: An introduction to the estuary, a brief history, and prior studies. Prog Oceanogr 25:1–13. 364 365 366 Smith M, Davis R, Youngblut N, Whitaker R, Metcalf W, Herfort L, et al. Metagenomic evidence for reciprocal particle exchange between the Columbia River estuarine water column and lateral bay sediments. Prep. 15 367 368 369 Smith MW, Herfort L, Tyrol K, Suciu D, Campbell V, Crump BC, et al. (2010). Seasonal Changes in Bacterial and Archaeal Gene Expression Patterns across Salinity Gradients in the Columbia River Coastal Margin. PLoS ONE 5:e13312. 370 371 Söding J. (2005). Protein homology detection by HMM–HMM comparison. Bioinformatics 21:951–960. 372 373 Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, et al. (2002). The Bioperl Toolkit: Perl Modules for the Life Sciences. Genome Res 12:1611–1618. 374 375 376 Stamatakis A. (2006). RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinforma Oxf Engl 22:2688– 2690. 377 378 379 Suyama M, Torrents D, Bork P. (2006). PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res 34:W609–W612. 380 381 382 Tatusov RL, Galperin MY, Natale DA, Koonin EV. (2000). The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 28:33– 36. 383 384 Tritt A, Eisen JA, Facciotti MT, Darling AE. (2012). An Integrated Pipeline for de Novo Assembly of Microbial Genomes. PLoS ONE 7:e42304. 385 386 Turner A, Millward GE. (2002). Suspended Particles: Their Role in Estuarine Biogeochemical Cycles. Estuar Coast Shelf Sci 55:857–883. 387 388 Vestergaard G, Garrett RA, Shah SA. (2014). CRISPR adaptive immune systems of Archaea. RNA Biol 11:156–167. 389 Wickham H. (2009). ggplot2: elegant graphics for data analysis. Springer New York. 390 16 Supplemental Tables and Figures 1 2 Supplemental Figure 1. The phylogenetic tree is a maximum likelihood inference (GTR-Γ model; 100 bootstrap replicates; rooted on Methanocaldococcus jannaschii DSM 2661) of full-length mcrA alleles from all isolates and select type strains. mcrA amplicon OTUs (95% sequence identity cutoff) were inserted into the tree. Red circles highlight nodes that have bootstrap values >70. Bootstrapping only applies to full-length mcrA alleles. The bar plot describes the number of sequences in each OTU or the number of isolates with the same mcrA allele. Only OTUs with a total abundance of ≥50 are shown. ‘*’ refers to the M. mazei isolates used for the population genomics comparisons. ‘**’ refers to the isolates used as part of the dN/dS analyses. Supplemental Table 9 lists the accession numbers of all reference strains used for the inference. 3 Supplemental Figure 2. Distribution of isolates obtained on all sample-mediasubstrate pairwise combinations. ‘Initial cultures’ refers to the number of cultures that grew in liquid media following initial colony picking. ‘Cultures sequenced’ refers to the cultures that were selected for genomic sequencing. 4 SarPi LYC Go1 100 WWM610 S6 100 C16 100 TMA 100 1.0 (DCJ distance) 0.001 Substitutions / site Supplemental Figure 3. A whole genome alignment (WGA) of all seven closed Methanosarcina mazei genomes. Colored regions are local co-linear blocks (LCBs), which are regions lacking rearrangement of homologous sequence. Identical colors and connecting lines identify LCBs found in multiple genomes. The height of the colored bars within blocks describes sequence identity of the genome region, with higher bars indicating higher sequence identity. The left dendrogram is an ML tree (GTR-Γ model; 100 bootstrap replicates) inferred from the WGA. The right dendrogram was produced by hierarchical clustering (average neighbor algorithm) of double-cut-and-join distance values, which is a measure of genome synteny. 5 Mean spacer content identity 0.7 ● 0.6 ● ● ● 0.5 ● ● 0.4 ● ● ● (0.901,1] (0.801,0.901] (0.702,0.801] (0.602,0.702] (0.503,0.602] (0.403,0.503] (0.304,0.403] (0.204,0.304] (0.105,0.204] (0.00447,0.105] ● Relative alignment position Supplemental Figure 4. M. mazei CRISPRs display more CRISPR spacer content variation at the leader end versus the trailer end. Alignment position is normalized by CRISPR length (number of spacers) and is relative to the leader end, with 0 being most proximal. Mean spacer content identity is the number of matching spacers (same unique nucleotide sequence) normalized by the number of spacers in each pairwise alignment of CRISPRs (truncated to the shortest CRISPR in the alignment). Mean spacer content identity was calculated separately for each of the 10 relative alignment position bins. The line ranges represent the standard error. 6 Supplemental Figure 5. Unrooted maximum likelihood phylogenies of all HdrA, HdrB, and HdrC homologs for all Methanosarcinales and Methanocella strains. Red and blue branches denote clades of genes solely found within Methanosarcina or Methanocella, respectively. 7 Supplemental Figure 6. Unrooted maximum likelihood phylogenies (GTR-Γ model; 100 bootstrap replicates) of all FdhA and FdhB homologs. 8 Isolate ID BB.F.A.2.3 BB.F.A.2.4 BB.F.T.0.2 BB.F.T.2.6 YBB.F.A.1A.1 YBB.F.A.1A.3 YBB.F.A.1B.1 YBB.F.A.2.12 YBB.F.A.2.3 YBB.F.A.2.5 YBB.F.A.2.6 YBB.F.A.2.7 YBB.F.T.1A.1 YBB.F.T.1A.2 YBB.F.T.1A.4 YBB.F.T.2.1 YBB.H.A.1A.1 YBB.H.A.1A.2 YBB.H.A.2.1 YBB.H.A.2.4 YBB.H.A.2.5 YBB.H.A.2.6 YBB.H.A.2.8 YBB.H.M.1A.1 YBB.H.M.1B.1 YBB.H.M.1B.2 YBB.H.M.1B.5 YBB.H.M.2.7 YBB.H.T.1A.1 YBB.H.T.1A.2 YBM.F.A.1A.3 YBM.F.A.1B.3 YBM.F.A.1B.4 YBM.F.A.2.8 YBM.F.M.0.5 YBM.H.A.0.1 YBM.H.A.1A.1 YBM.H.A.1A.3 YBM.H.A.1A.4 YBM.H.A.1A.6 YBM.H.A.2.1 YBM.H.A.2.3 YBM.H.A.2.6 YBM.H.A.2.7 YBM.H.A.2.8 YBM.H.M.0.1 YBM.H.M.1A.1 YBM.H.M.1A.2 YBM.H.M.1A.3 YBM.H.M.2.1 YBM.H.M.2.2 YBM.H.M.2.3 YBM.H.M.2.4 YBM.H.T.2.1 YBM.H.T.2.3 YBM.H.T.2.5 Number of scaffolds 189 146 136 236 267 147 167 209 185 146 160 142 156 163 139 162 349 182 159 157 186 167 222 167 211 136 136 199 128 185 124 144 171 238 164 157 172 205 123 231 294 171 204 374 337 176 175 155 257 263 154 142 260 159 148 142 Number of contigs 200 169 178 245 284 210 196 216 196 172 203 171 184 188 174 198 381 191 185 182 236 198 228 217 223 177 181 211 162 194 151 168 189 263 180 185 187 214 174 237 320 206 232 446 381 191 184 181 270 317 201 198 266 194 173 186 N50 (bp) 41060 53551 50618 31635 30147 46474 43698 35291 41284 46460 41652 44169 50446 43793 51028 42908 20277 40758 44614 49010 42679 42927 30384 47520 32240 53229 54280 35791 56639 37461 77131 49810 47035 30152 44189 44832 39894 36695 67853 35968 25450 42899 36997 42470 22132 42223 44782 51373 27646 31243 44369 47314 26897 46703 44866 49449 Maximum scaffold length (bp) 125206 300086 276626 94217 120922 162679 132890 123524 130396 159024 212372 173088 216583 114908 391037 169651 67388 96087 255431 305217 108713 117048 108909 305568 177850 229399 183100 110868 211658 108767 274406 179941 166253 107661 151275 172191 260934 111392 433088 76848 116492 125154 135580 130098 68948 251341 130261 176586 92171 99806 216779 166334 92541 130619 176434 162387 Total length (bp) 4077722 4198417 4091453 4061783 4102127 4072655 4077893 4075980 4078713 4067732 4042355 4028272 4164309 4157382 4159899 4109510 4010396 4095496 3981512 4123502 4093088 4013698 4005819 4120526 4129439 4125300 4121851 4003367 4074613 4091099 4048935 4033711 4053184 3971272 4072340 3967638 4044602 4084784 3988142 4079528 3997775 4091047 4006298 4232359 4076602 4063805 4085193 4080746 4081406 4187444 4081675 3977612 4079098 4076911 3971657 4085119 Number of CDS 3942 4068 3943 3919 4053 3912 3945 3925 3932 3929 3899 3870 4022 4020 4028 3998 3954 3972 3859 3990 3968 3890 3891 3970 3991 3973 3978 3864 3965 3965 3892 3868 3902 3858 3934 3817 3898 3935 3852 3964 3858 3953 3887 4006 3934 3972 3957 3935 3944 4054 3934 3842 3957 3939 3839 3952 Coverage 44.1 42.9 44.0 44.3 43.8 44.2 44.1 44.2 44.1 44.2 44.5 44.7 43.2 43.3 43.3 43.8 44.9 43.9 45.2 43.6 44.0 44.8 44.9 43.7 43.6 43.6 43.7 45.0 44.2 44.0 44.5 44.6 44.4 45.3 44.2 45.4 44.5 44.1 45.1 44.1 45.0 44.0 44.9 42.3 44.1 44.3 44.1 44.1 44.1 43.0 44.1 45.2 44.1 44.1 45.3 44.1 GenBank Accession JJOR00000000 JJOS00000000 JJOT00000000 JJOU00000000 JJPA00000000 JJPB00000000 JJPC00000000 JJPD00000000 JJPE00000000 JJPF00000000 JJPG00000000 JJPH00000000 JJPI00000000 JJPJ00000000 JJPK00000000 JJPL00000000 JJPM00000000 JJPN00000000 JJPO00000000 JJPP00000000 JJPQ00000000 JJPR00000000 JJPS00000000 JJPT00000000 JJPU00000000 JJPV00000000 JJPW00000000 JJPX00000000 JJPY00000000 JJPZ00000000 JJQA00000000 JJQB00000000 JJQC00000000 JJQD00000000 JJQE00000000 JJQF00000000 JJQG00000000 JJQH00000000 JJQI00000000 JJQJ00000000 JJQK00000000 JJQM00000000 JJQN00000000 JJQO00000000 JJQP00000000 JJQQ00000000 JJQR00000000 JJQS00000000 JJQT00000000 JJQU00000000 JJQV00000000 JJQW00000000 JJQX00000000 JJQZ00000000 JJRA00000000 JJRB00000000 Supplemental Table 1. Genome assembly contiguity statistics for all 56 M. mazei cultures. The naming scheme for isolates is described in the Figure 1 legend. 9 10 Supplemental Table 2. Genome assembly contiguity statistics for all 7 M. lacustris-like cultures. The naming scheme for isolates is described in the Figure 1 legend. 11 Supplemental Table 3. Genome assembly breakpoint statistics for draft genome assemblies of M. barkeri Fusaro, M. mazei C16, M. mazei LYC, and M. mazei WWM610. The draft genome assemblies of Illumina reads (using the same assembly pipeline as the isolate genome assemblies) were mapped onto the complete versions of each genome in order to identify gaps (i.e., assembly breakpoints). 12 3 2 1 1 3 2 0 2 3 4 0 4 3 0 3 3 4 0 4 2 0 2 0 1 2 1 1 1 0 0 4 1 0 1 1 0 0 0 2 2 1 7 0 0 2 6 2 0 0 1 2 1 2 3 2 1 Number of inferred transfers mazei-T è mazei-WC mazei-WC è mazei-T Median bootstrap 60 69 67 97 54.5 91 89.5 59 55.5 82 51 63.5 78 54.5 74 51 53 67.5 61 78.5 74.5 58.5 77 51.5 63.5 51.5 99 89.5 Annotation Arylsulfatase regulatory protein Hypothetical protein Hypothetical protein Hypothetical protein Hypothetical protein Cell division protein FtsZ (EC 3.4.24.-) Dihydroxy-acid dehydratase (EC 4.2.1.9) Hypothetical protein Glutaredoxin family protein Archaea-specific Superfamily II helicase Oligopeptide transporter, ATP-binding protein Hypothetical protein Hypothetical protein Hypothetical protein Hypothetical protein Hypothetical protein ATP-dependent helicase Hypothetical protein Hypothetical protein Sulfite reductase-related protein NAD-specific glutamate dehydrogenase (EC 1.4.1.2) Sensory transduction protein kinase (EC:2.7.3.- ) Cell surface protein Hypothetical protein N-acetyltransferase Hypothetical protein Ubiquitin-like small archaeal modifier protein SAMP2 Hypothetical protein Start (bp) 348320 540757 570241 594949 1270167 1293312 1427813 1433189 1786464 1846873 2206533 2357453 2438184 2514144 2713715 2737493 2888132 2904681 3191345 3368826 3377762 3407278 3840103 3858558 4028868 4081260 4148750 NA M. mazei C16 End (bp) Locus tag 349183 MSMAC_0268 539993 MSMAC_0435 569009 MSMAC_0448 593591 MSMAC_0481 1269424 MSMAC_1048 1293127 MSMAC_1065 1426911 MSMAC_1173 1432308 MSMAC_1179 1786654 MSMAC_1444 1848234 MSMAC_1506 2205724 MSMAC_1802 2357938 MSMAC_1928 2438621 MSMAC_1980 2512081 MSMAC_2043 2713353 MSMAC_2220 2737783 MSMAC_2238 2887563 MSMAC_2367 2901730 MSMAC_2380 3190836 MSMAC_2594 3368197 MSMAC_2736 3375858 MSMAC_2742 3408417 MSMAC_2763 3837464 MSMAC_3099 3857809 MSMAC_3117 4030307 MSMAC_3267 4080874 MSMAC_3314 4149820 MSMAC_3360 NA NA Supplemental Table 4. The number of inter-clade gene transfers inferred by Mowgli for core genes with median bootstrap values of >50. ‘NA’ indicates that the gene was not found in M. mazei C16. 13 14 Fst -0.01 0.02 0.07 0.52 0.50 0.11 0.52 0.96 Start (bp) 818143 1431860 1573694 1709081 2034145 2125926 3822717 3919879 End (bp) 818316 1430548 1574449 1706967 2033957 2127350 3822571 3919652 Supplemental Table 5. The annotations of all core genes with a dN/dS of >1. dN/dS 2.31 1.12 1.72 1.15 2.00 1.83 1.19 1.54 Sequence Identity annotation 98.75 hypothetical protein 99.91 phosphoglycerate mutase (EC:5.4.2.1) 99.82 Endonuclease III (EC 4.2.99.18) 99.88 hypothetical protein 97.12 hypothetical protein 99.61 PQQ enzyme repeat domain protein 98.26 hypothetical protein 98.71 hypothetical protein Locus tag MSMAC_0670 MSMAC_1177 MSMAC_1292 MSMAC_1399 MSMAC_1666 MSMAC_1737 MSMAC_3086 MSMAC_3170 M. mazei C16 Category Clade Site Media Substrate Subtype Total Group mazei-T mazei-WC YBB YBM Freshwater Marine Acetate Methanol TMA I-B I-C I-D I-E I-G III-A III-B VIII-3 Total Number of specific*, Number of specific*, unique spacers unique spacers (% of total) 75 93 0 92 0 0 0 0 0 3 75 25 34 63 38 75 69 2240 3.5 4.2 0.0 4.1 0.0 0.0 0.0 0.0 0.0 0.1 3.4 1.1 1.5 2.8 1.7 3.4 3.1 100.0 Supplemental Table 6. The number of unique spacers specific to each group in each category. * ‘specific’ defined as spacers found in the majority of strains in one clade (mazei-WC or mazei-T) but absent from the other. 15 acetate Df Sum Sq Mean Sq F value clade 1 0.000 0.000 0.564 round 3 0.003 0.001 14.338 clade:round 3 0.000 0.000 1.320 Residuals 36 0.002 0.000 methanol Df Sum Sq Mean Sq F value clade 1 0.000 0.000 1.423 round 3 0.003 0.001 11.608 clade:round 3 0.002 0.001 6.232 Residuals 27 0.002 0.000 trimethylamine Df Sum Sq Mean Sq F value clade 1 0.001 0.001 11.895 round 3 0.001 0.000 4.732 clade:round 3 0.000 0.000 1.617 Residuals 32 0.003 0.000 Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Pr(>F) 0.458 0.000 0.283 Pr(>F) 0.243 0.000 0.002 Pr(>F) 0.002 0.008 0.205 *** *** ** ** ** Supplemental Table 7. Nested ANOVA tables assessing significant treatment effects of clade (mazei-WC and mazei-T) and round (Rounds 1-4) on maximum methane production rates. 16 Locus tag (M. mazei Gö1) Annotation MM0011 Hypothetical protein MM0093 Cobyric acid synthase CbiP MM0174 Methanol corrinoid protein MtaC3 MM0175 Methanol:corrinoid methyltransferase MtaB3 MM0176 Methylcobalamin:coenzyme M methyltransferase MtaA2 MM0312 Hypothetical protein MM0408 Hypothetical protein MM0496 Phosphate acetyltransferase MM0583 Hypothetical protein MM0671 2-Isopropylmalate synthase MM0772 Hypothetical protein MM0869 Hypothetical protein MM0870 Beta-ketoacyl synthase/thiolase MM0871 Hydroxymethylglutaryl-CoA synthase MM0872 Putative transcriptional regulator MM0924 Hypothetical protein MM1025 Thiamine biosynthesis protein ThiC MM1070 Methylcobalamin:coenzyme M methyltransferase MtaA1 MM1071 Hypothetical protein MM1073 Methanol corrinoid protein MtaC2 MM1074 Methanol:corrinoid methyltransferase MtaB2 MM1075 putative regulatory gene MtaR MM1112 Hypothetical protein MM1271 2-Dehydro-3-desoxyphosphoheptanote aldolase MM1272 3-Dehydroquinate synthase MM1273 3-Dehydroquinate dehydratase MM1274 Shikimate 5-dehydrogenase MM1275 Prephenate dehydrogenase MM1284 2-Isopropylmalate synthase MM1304 Hypothetical protein MM1321 Formylmethanofuran H4MPT formyltransferase MM1434 Methylamine permease MtmP MM1435 Methylamine permease MtmP MM1436 Monomethylamine:corrinoid methyltransferase MtmB1 MM1438 Monomethylamine corrinoid protein MtmC1 MM1439 Methylcobalamin:coenzyme M methyltransferase MtbA2 MM1488 Hypothetical protein MM1601 Cobalamin biosynthesis protein CobN MM1602 Cobalamin biosynthesis protein CobN MM1612 Hypothetical protein MM1647 Methanol:corrinoid methyltransferase MtaB1 MM1648 Methanol corrinoid protein MtaC1 MM1687 Dimethylamine corrinoid protein MtbC1 MM1688 Trimethylamine:corrinoid methyltransferase MttB1 MM1690 Trimethylamine corrionid protein MttC1 MM1691 Trimethylamine permease MttP1 MM1693 Dimethylamine:corrinoid methyltransferase MtbB1 MM1761 Hypothetical protein MM1762 Mevalonate kinase MM1950 Catalase MM1951 Hypothetical protein MM1977 Hypothetical protein MM1982 Alkyl sulfatase MM2045 Trimethylamine permease MttP2 MM2046 Trimethylamine permease MttP2 MM2047 Trimethylamine corrionid protein MttC2 MM2049 Trimethylamine:corrinoid methyltransferase MttB2 MM2051 Dimethylamine:corrinoid methyltransferase MtbB2 MM2052 Dimethylamine corrionid protein MtbC2 MM2338 Hypothetical protein MM2387 Cobalt transport ATP-binding protein CbiO MM2818 Anthranilate synthase component I MM2821 Tryptophan synthase, alpha chain MM2822 Tryptophan synthase subunit beta MM2843 Hypothetical protein MM2882 Hypothetical protein MM2933 Hypothetical protein MM2961 Dimethylamine corrinoid protein MtbC3 MM2962 Dimethylamine:corrinoid methyltransferase MtbB3 MM2964 Dimethylamine permease MtbP MM3011 Hypothetical protein MM3108 Hypothetical protein MM3197 Hypothetical protein MM3334 Monomethylamine corrinoid protein MtmC2 MM3335 Monomethylamine:corrinoid methyltransferase MtmB2 * 'NA' if no variation at conserved alignment positions ** based on Krätzer et al., 2009 dN/dS* 0.09 NA NA NA NA 0.08 0.06 NA 0.24 0.10 0.09 NA 0.16 1.15 0.06 0.11 0.11 0.17 0.14 NA 0.05 0.16 NA NA NA NA NA 0.35 0.10 NA 0.07 NA NA 0.07 0.21 0.10 0.58 0.11 0.10 0.05 0.12 0.15 0.04 0.08 0.04 NA 0.03 0.20 NA 0.21 NA NA NA 0.10 0.07 0.05 0.05 0.03 0.05 0.16 0.10 0.35 NA NA 0.46 0.13 0.47 0.03 0.03 0.06 0.41 NA 0.08 0.16 0.11 17 FST 0.00 0.02 0.00 0.00 0.00 0.05 0.00 0.00 0.46 0.18 0.00 1.00 0.99 0.01 0.12 0.02 -0.02 0.93 0.00 0.26 0.30 -0.01 0.00 -0.02 0.00 0.00 0.00 0.09 0.18 0.05 -0.03 -0.01 0.37 -0.04 0.28 0.13 0.04 0.25 0.29 0.97 0.80 0.73 0.05 -0.02 -0.03 0.06 0.00 0.12 0.04 0.33 0.02 0.00 0.00 0.17 0.18 0.00 -0.01 0.00 0.72 0.00 0.07 0.22 0.00 0.00 0.04 0.02 0.46 0.01 0.01 -0.01 0.47 0.03 -0.01 -0.12 0.17 Sequence Identity Mean copy number (T,WC) Mean gene length (T,WC) Differentially expressed on MeOH vs TMA?** 77.24 99.95 100.00 100.00 100.00 77.31 91.69 100.00 99.15 99.79 77.24 99.75 99.48 99.89 98.03 86.64 77.83 97.07 72.29 99.74 98.41 73.82 100.00 80.04 100.00 100.00 100.00 99.91 99.79 99.99 99.82 99.99 99.92 82.74 99.57 99.66 99.19 99.67 99.66 97.73 98.98 99.68 99.41 99.80 99.61 99.94 99.27 99.91 99.94 99.39 99.73 100.00 99.93 99.49 96.72 95.57 98.34 99.33 95.93 75.93 99.98 97.86 100.00 100.00 99.84 99.22 99.91 99.62 99.33 99.88 98.68 99.98 99.93 99.07 64.03 7.9, 8.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 2.0, 2.0 2.0, 1.5 1.0, 1.0 1.0, 1.0 1.0:1.0 7.9:8.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 2.0, 2.0 2.0, 2.0 1.0, 1.0 3.0, 3.0 1.0, 1.0 1.0, 1.0 3.0, 3.0 1.0, 1.0 2.0, 2.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 2.0, 2.0 1.0, 1.0 1.0, 1.0 2.0, 2.0 1.0, 1.0 1.0, 1.0 1.7, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 2.0, 2.0 2.0, 2.0 1.0, 1.0 8.7, 9.2 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 1.0, 1.0 389, 388 1472, 1477 765, 765 1386, 1386 1017, 1017 709, 710 451, 447 1002, 1002 795, 795 1537, 1548 389, 388 399, 399 1173, 1173 1039, 1050 741, 734 178, 187 1289, 1299 1017, 1017 1624, 1624 773, 774 1383, 1383 593, 593 273, 273 792, 799 1143, 1143 729, 729 843, 843 1419, 1419 1537, 1548 836, 846 894, 894 270, 270 1407, 1407 1362, 1374 467, 611 1020, 1020 1030, 976 4328, 4320 4540, 4554 1791, 1780 1383, 1383 774, 774 454, 442 1242, 1242 648, 648 1047, 1050 291, 291 798, 798 917, 925 1229, 2133 154, 144 579, 579 1727, 1720 1449, 1449 1027, 1044 654, 654 1242, 1242 1192, 1217 648, 645 204, 203 1497, 1497 1874, 1870 816, 816 1212, 1212 1266, 1287 1098, 1091 1590, 1590 642, 642 1192, 1217 1446, 1446 816, 816 1005, 1003 411, 411 534, 598 1374, 1374 Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Supplemental Table 8. Genetic differentiation between mazei-T and mazei-WC of genes potentially associated with growth of Methanosarcina spp. on TMA. The table includes all methyltransferase 1 (MT1), methyltransferase 2 (MT2), corrinoid proteins, and putative regulatory proteins involved in methylotrophic growth of Methanosarcina. In addition, the table includes all genes shown by Krätzer and colleagues to be differentially expressed in Methanosarcina mazei Gö1 when grown on methanol versus TMA (Krätzer et al., 2009). Only genes present in >1 M. mazei isolate from both YBM and YBB are shown. Bold values highlight genes with a dN/dS > 1.1, FST > 0.7, or substantially differing mean copy numbers or gene lengths between mazei-T and mazei-WC. Gene length is in base pairs (bp). 18 Strain Methanobacterium sp AL21 1 Methanobrevibacter smithii ATCC Methanocaldococcus jannaschii DSM 2661 Methanocella arvoryzae MRE50 Methanocella conradii HZ254 Methanococcoides burtonii DSM 6242 Methanococcoides methylutens MM1 Methanococcus maripaludis C5 Methanoculleus marisnigri JR1 Methanohalobium evestigatum Z7303 Methanomethylovorans hollandica WWM590 Methanopyrus kandleri AV19 Methanoregula boonei 6A8 Methanosaeta concilii GP6 Methanosarcina acetivorans C2A Methanosarcina baltica type strain Methanosarcina barkeri str fusaro Methanosarcina calensis Cali Methanosarcina horonobensis HB1 Methanosarcina lacustris Z7289 Methanosarcina mazei C16 Methanosarcina mazei Go1 Methanosarcina mazei TMA Methanosarcina mazei WWM610 Methanosarcina siciliae C2J Methanosarcina sp Kolksee Methanosarcina sp MTP4 Methanosarcina thermophila TM1 DSM1825 Methanosphaera stadtmanae DSM 3091 Methanosphaerula palustris E19c Methanospirillum hungatei JF1 Methanothermobacter marburgensis str Marburg Methanothermus fervidus DSM Accession Number CP002551 CP000678 L77117 AM114193 CP003243 CP000300 * CP000609 CP000562 CP002069 * AE009439 CP000780 AB679170 AE010299 * CP000099 * * * * AE008384 * * * * * * CP000102 CP001338 CP000254 CP001710 CP002278 * Awaiting acceptance from GenBank Supplemental Table 9. GenBank accession numbers for all reference strains used in Supplemental Figure 1. 19