1 SUPPLEMENTARY METHODS 2 3 Genomic properties of Marine Group A bacteria indicate a role in the marine sulfur cycle 4 5 Jody J. Wright1, Keith Mewis2, Niels W. Hanson3, Kishori M. Konwar1, Kendra R. Maas1, 6 Steven J. Hallam1,3* 7 8 1 Department of Microbiology & Immunology, University of British Columbia 9 2 Genome Science and Technology Program, University of British Columbia 10 3 Graduate Program in Bioinformatics, University of British Columbia 11 * To whom correspondence should be addressed: 12 13 14 University of British Columbia, Department of Microbiology and Immunology 15 2552-2350 Health Sciences Mall, Vancouver, BC Canada V6T 1Z3 16 Office: (604) 827-3420 email: shallam@mail.ubc.ca 17 18 Running title: Marine Group A diversity and function in the North Pacific 19 20 21 22 23 24 25 26 1 1 Phylogenetic analysis and tree construction using MGA 16S rRNA gene sequences 2 Full-length 16S rRNA gene clone sequences from the NESAP (3 164; (Allers et al., 2012)) and 3 SI (6 645; (Zaikova et al., 2010)) as well as partial and full-length 16S rRNA sequences 4 obtained from large-insert DNA fragments affiliated with MGA were imported in the ARB 5 software package (Release 106; (Ludwig et al., 2004)), added to the SILVA database (www.arb- 6 silva.de) (Pruesse et al., 2007), aligned to the closest relative, and added to an existing tree of 7 sequences from the ARB database by using the ARB parsimony tool (using default parameters). 8 A maximum likelihood phylogenetic tree of MGA 16S rRNA gene sequences exported 9 from ARB was inferred by PHYML (Guindon et al., 2005) using an HKY + 4G + I model of 10 nucleotide evolution where the parameter of the G distribution, the proportion of invariable sites, 11 and the transition/transversion ratio were estimated for each dataset. The confidence of each 12 node was determined by assembling a consensus tree of 100 bootstrap replicates. Non-chimeric 13 bacterial 16S rRNA gene sequences were also placed in taxonomic hierarchy for downstream 14 analysis using the NAST aligner (DeSantis et al., 2006) and blast using default parameters 15 against the 2008 Greengenes database (DeSantis et al., 2006), and 705 sequences were identified 16 as belonging to MGA (415 from SI in addition to 290 previously reported in by Allers, Wright 17 and colleagues (Allers et al., 2012)). Results of this analysis were not significantly different 18 from those performed using a newer version of the Greengenes database (2012), thus 2008 19 results were used to be consistent with previous work described in Allers et al., (2012). These 20 705 sequences were clustered at 97% identity using mothur (Schloss et al., 2009) (v.1.19.0). 21 Representative sequences from each of these clusters were identified using the get.oturep 22 command in mothur and were included in the phylogenetic tree. The abundance and distribution 23 of 97% clusters was visualized in a histogram-heatmap in R (Figure S3). 2 1 2 Fosmid library construction and end sequencing 3 Prior to cloning, ~4 μg of environmental DNA was further purified on a CsCl density gradient as 4 previously described (Hallam 2004). Fosmid libraries were prepared using the CopyControl 5 Fosmid Library Production Kit (Epicentre, Madison, WI). Briefly, ~1 μg of CsCl-purified DNA 6 was blunt end repaired and separated on a 1% low melt agarose pulse-field gel O/N at 6 V/cm. 7 The 40-50 kb fragment range was excised and gel purified using agarase, followed by 8 concentration using an Amicon Ultracel 10K filter device (Millipore, Billerica, MA, USA). DNA 9 was ligated into the pCC1fos vector, packaged using the MaxPlax lambda packaging extract, and 10 used to transfect TransforMax EPI300 E. coli cells (Epicentre). Transfected cells were plated on 11 selective agar and fosmid clones picked using the QPix2 robotic colony picker (Molecular 12 Devices, Sunnyvale, CA) and grown in selective media for DNA sequencing. The fosmid library 13 production 14 http://www.jove.com/index/Details.stp?ID=1387 15 sequencing 16 GTTTTCCCAGTCACGAC) and reverse (5’-CAGGAAACAGCTATGAC) primers and the 17 BigDye sequencing kit (Applied Biosystems, Carlsbad, CA) on a Sanger platform at the 18 Department of Energy Joint Genome Institute (DOE-JGI; Walnut Creek, CA). The reactions 19 were purified by a magnetic bead protocol and run on an ABI PRISM3730 (Applied Biosystems) 20 capillary DNA sequencer (for research protocols, see http://jgi.doe.gov). Bidirectional end 21 sequencing of NESAP fosmids was performed with standard pCC1 forward (5’- 22 GGATGTGCTGCAAGGCGATTAAGTTGG) 23 CTCGTATGTTGTGTGGAATTGTGAGC) primers on a Sanger platform at Canada’s Michael protocol of SI can fosmids be was viewed as a visualized experiment at (Taupp et al., 2009). Bidirectional end performed with and standard M13 reverse forward (5’- (5’- 3 1 Smith Genome Sciences Centre (GSC; Vancouver, BC). 2 3 4 5 Fosmid library screening, preparation, and full-length sequencing 6 Sequencing of the 6 SI fosmids was carried out at the DOE-JGI on an ABI PRISM3730 (Applied 7 Biosystems) capillary DNA sequencer (for research protocols, see http://jgi.doe.gov). 8 Sequencing of the 8 NESAP fosmids was performed using the IonTorrent PGM (Life 9 Technologies, San Francisco, CA, USA) at the University of British Columbia. Briefly, fosmid 10 DNA was prepared using Montage Plasmid96 Miniprep kit (Millipore), and 100 ng of template 11 was used in barcoded library construction for 200 bp read length libraries according to standard 12 protocols provided with the IonTorrent PGM. These 8 libraries were sequenced with two Ion316 13 chips. Runs were combined and processed, yielding between 33 261 and 76 270 reads for each 14 fosmid. Raw data was assembled using the MIRA assembler (Chevreux et al., 2004), which 15 gave outputs ranging from 2 to 77 contigs. Contigs were further processed using Sequencher 4.8 16 (GeneCodes Corp, Ann Arbor, MI, USA) to combine contigs using default settings (20 bp 17 overlap, 85% similarity). Any mismatches in the overlapping regions were replaced with N. 18 Contigs were then compared to the original end sequences to ensure proper identity, yielding one 19 contig from each assembly that matched both original end sequences in 7 of 8 cases. In 5 of these 20 7 cases the vector was found in the middle of the contig, necessitating its removal. For these 5 21 contigs, the vector sequence was trimmed out and the resulting two contigs were joined at the 22 opposite ends with a string of 100 Ns. One fosmid (413009-K18) produced 2 contigs (16.8 kb 23 and 18.7 kb) with each matching either the forward or reverse end sequence. In some cases 24 limited coverage introduced sequencing errors interrupting open reading frames. Eleven of these 25 regions were identified and primers were designed targeting these regions for verification with 4 1 Sanger sequencing. Primers to these regions are provided in table S2. GenBank files contain the 2 Sanger-verified fosmid sequences. 3 4 Fragment recruitment of fosmid end sequences 5 Coverage plots relating fosmid end sequences from individual NESAP and SI fosmid end 6 libraries to large-insert DNA fragments were generated by using the Promer program 7 implemented in MUMmer 3.23 (Kurtz et al., 2004) using the following parameters as cited in 8 (Hallam et al., 2006): breaklength = 60, minimum cluster length = 20, and match length = 10. 9 Resulting delta files were converted into coordinate files using the show-coords program and 10 visualized in graphical format (coverage plot) by using the MUMmerplot program. Also using 11 the coordinate files, the number of fosmid end sequences recruited to each large insert DNA 12 fragment was calculated at 60% - 80% nucleotide similarity and at >80% nucleotide similarity, 13 ends recruiting to the 16S-23S rRNA region were subtracted, remaining ends were normalized to 14 total number of ends per library, and the normalized proportion of sequences in each library 15 recruited to each large-insert fragment was visualized using bubble.pl (available for download at: 16 http://hallam.microbiology.ubc.ca/downloads/index.html). The number of fosmid end sequences 17 recruited to the psr operon on fosmids FPPP_13C3 and 122006-I05 was also calculated and 18 visualized as described above. 19 20 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 References Allers, E., Wright, J.J., Konwar, K.M., Howes, C.G., Beneze, E., Hallam, S.J., and Sullivan, M.B. (2012). Diversity and population structure of Marine Group A bacteria in the Northeast subarctic Pacific Ocean. ISME J 7, 256-268. Chevreux, B., Pfisterer, T., Drescher, B., Driesel, A.J., Müller, W.E., Wetter, T., and Suhai, S. (2004). Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Research 14, 1147-159. DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K., Huber, T., Dalevi, D., Hu, P., and Andersen, G.L. (2006). Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 72, 5069-072. DeSantis, T.Z.J., Hugenholtz, P., Keller, K., Brodie, E.L., Larsen, N., Piceno, Y.M., Phan, R., and Andersen, G.L. (2006). NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes. Nucleic Acids Res 34, W394-99. Guindon, S., Lethiec, F., Duroux, P., and Gascuel, O. (2005). PHYML Online--a web server for fast maximum likelihood-based phylogenetic inference. Nucleic Acids Res 33, W557-59. Hallam, S.J., Konstantinidis, K.T., Putnam, N., Schleper, C., Watanabe, Y., Sugahara, J., Preston, C., de la Torre, J., Richardson, P.M., and DeLong, E.F. (2006). Genomic analysis of the uncultivated marine crenarchaeote Cenarchaeum symbiosum. Proc Natl Acad Sci U S A 103, 18296-8301. Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., and Salzberg, S.L. (2004). Versatile and open software for comparing large genomes. Genome Biol 5, R12. Ludwig, W., Strunk, O., Westram, R., Richter, L., Meier, H., Yadhukumar, Buchner, A., Lai, T., Steppi, S., et al. (2004). ARB: a software environment for sequence data. Nucleic Acids Research 32, 1363-371. Pruesse, E., Quast, C., Knittel, K., Fuchs, B.M., Ludwig, W.G., Peplies, J., and Glockner, F.O. (2007). SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Research 35, 7188-196. Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister, E.B., Lesniewski, R.A., Oakley, B.B., Parks, D.H., and Robinson, C.J. (2009). Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75, 7537-541. Taupp, M., Lee, S., Hawley, A., Yang, J., and Hallam, S.J. (2009). Large insert environmental genomic library production. J Vis Exp 6 1 2 3 Zaikova, E., Walsh, D.A., Stilwell, C.P., Mohn, W.W., Tortell, P.D., and Hallam, S.J. (2010). Microbial community dynamics in a seasonally anoxic fjord: Saanich Inlet, British Columbia. Environ Microbiol 12, 172-191. 7