Supplementary Methods (doc 40K)

advertisement
1
SUPPLEMENTARY METHODS
2
3
Genomic properties of Marine Group A bacteria indicate a role in the marine sulfur cycle
4
5
Jody J. Wright1, Keith Mewis2, Niels W. Hanson3, Kishori M. Konwar1, Kendra R. Maas1,
6
Steven J. Hallam1,3*
7
8
1
Department of Microbiology & Immunology, University of British Columbia
9
2
Genome Science and Technology Program, University of British Columbia
10
3
Graduate Program in Bioinformatics, University of British Columbia
11
*
To whom correspondence should be addressed:
12
13
14
University of British Columbia, Department of Microbiology and Immunology
15
2552-2350 Health Sciences Mall, Vancouver, BC Canada V6T 1Z3
16
Office: (604) 827-3420 email: shallam@mail.ubc.ca
17
18
Running title: Marine Group A diversity and function in the North Pacific
19
20
21
22
23
24
25
26
1
1
Phylogenetic analysis and tree construction using MGA 16S rRNA gene sequences
2
Full-length 16S rRNA gene clone sequences from the NESAP (3 164; (Allers et al., 2012)) and
3
SI (6 645; (Zaikova et al., 2010)) as well as partial and full-length 16S rRNA sequences
4
obtained from large-insert DNA fragments affiliated with MGA were imported in the ARB
5
software package (Release 106; (Ludwig et al., 2004)), added to the SILVA database (www.arb-
6
silva.de) (Pruesse et al., 2007), aligned to the closest relative, and added to an existing tree of
7
sequences from the ARB database by using the ARB parsimony tool (using default parameters).
8
A maximum likelihood phylogenetic tree of MGA 16S rRNA gene sequences exported
9
from ARB was inferred by PHYML (Guindon et al., 2005) using an HKY + 4G + I model of
10
nucleotide evolution where the parameter of the G distribution, the proportion of invariable sites,
11
and the transition/transversion ratio were estimated for each dataset. The confidence of each
12
node was determined by assembling a consensus tree of 100 bootstrap replicates. Non-chimeric
13
bacterial 16S rRNA gene sequences were also placed in taxonomic hierarchy for downstream
14
analysis using the NAST aligner (DeSantis et al., 2006) and blast using default parameters
15
against the 2008 Greengenes database (DeSantis et al., 2006), and 705 sequences were identified
16
as belonging to MGA (415 from SI in addition to 290 previously reported in by Allers, Wright
17
and colleagues (Allers et al., 2012)). Results of this analysis were not significantly different
18
from those performed using a newer version of the Greengenes database (2012), thus 2008
19
results were used to be consistent with previous work described in Allers et al., (2012). These
20
705 sequences were clustered at 97% identity using mothur (Schloss et al., 2009) (v.1.19.0).
21
Representative sequences from each of these clusters were identified using the get.oturep
22
command in mothur and were included in the phylogenetic tree. The abundance and distribution
23
of 97% clusters was visualized in a histogram-heatmap in R (Figure S3).
2
1
2
Fosmid library construction and end sequencing
3
Prior to cloning, ~4 μg of environmental DNA was further purified on a CsCl density gradient as
4
previously described (Hallam 2004). Fosmid libraries were prepared using the CopyControl
5
Fosmid Library Production Kit (Epicentre, Madison, WI). Briefly, ~1 μg of CsCl-purified DNA
6
was blunt end repaired and separated on a 1% low melt agarose pulse-field gel O/N at 6 V/cm.
7
The 40-50 kb fragment range was excised and gel purified using agarase, followed by
8
concentration using an Amicon Ultracel 10K filter device (Millipore, Billerica, MA, USA). DNA
9
was ligated into the pCC1fos vector, packaged using the MaxPlax lambda packaging extract, and
10
used to transfect TransforMax EPI300 E. coli cells (Epicentre). Transfected cells were plated on
11
selective agar and fosmid clones picked using the QPix2 robotic colony picker (Molecular
12
Devices, Sunnyvale, CA) and grown in selective media for DNA sequencing. The fosmid library
13
production
14
http://www.jove.com/index/Details.stp?ID=1387
15
sequencing
16
GTTTTCCCAGTCACGAC) and reverse (5’-CAGGAAACAGCTATGAC) primers and the
17
BigDye sequencing kit (Applied Biosystems, Carlsbad, CA) on a Sanger platform at the
18
Department of Energy Joint Genome Institute (DOE-JGI; Walnut Creek, CA). The reactions
19
were purified by a magnetic bead protocol and run on an ABI PRISM3730 (Applied Biosystems)
20
capillary DNA sequencer (for research protocols, see http://jgi.doe.gov). Bidirectional end
21
sequencing of NESAP fosmids was performed with standard pCC1 forward (5’-
22
GGATGTGCTGCAAGGCGATTAAGTTGG)
23
CTCGTATGTTGTGTGGAATTGTGAGC) primers on a Sanger platform at Canada’s Michael
protocol
of
SI
can
fosmids
be
was
viewed
as
a
visualized
experiment
at
(Taupp et al., 2009). Bidirectional end
performed
with
and
standard
M13
reverse
forward
(5’-
(5’-
3
1
Smith Genome Sciences Centre (GSC; Vancouver, BC).
2
3
4
5
Fosmid library screening, preparation, and full-length sequencing
6
Sequencing of the 6 SI fosmids was carried out at the DOE-JGI on an ABI PRISM3730 (Applied
7
Biosystems) capillary DNA sequencer (for research protocols, see http://jgi.doe.gov).
8
Sequencing of the 8 NESAP fosmids was performed using the IonTorrent PGM (Life
9
Technologies, San Francisco, CA, USA) at the University of British Columbia. Briefly, fosmid
10
DNA was prepared using Montage Plasmid96 Miniprep kit (Millipore), and 100 ng of template
11
was used in barcoded library construction for 200 bp read length libraries according to standard
12
protocols provided with the IonTorrent PGM. These 8 libraries were sequenced with two Ion316
13
chips. Runs were combined and processed, yielding between 33 261 and 76 270 reads for each
14
fosmid. Raw data was assembled using the MIRA assembler (Chevreux et al., 2004), which
15
gave outputs ranging from 2 to 77 contigs. Contigs were further processed using Sequencher 4.8
16
(GeneCodes Corp, Ann Arbor, MI, USA) to combine contigs using default settings (20 bp
17
overlap, 85% similarity). Any mismatches in the overlapping regions were replaced with N.
18
Contigs were then compared to the original end sequences to ensure proper identity, yielding one
19
contig from each assembly that matched both original end sequences in 7 of 8 cases. In 5 of these
20
7 cases the vector was found in the middle of the contig, necessitating its removal. For these 5
21
contigs, the vector sequence was trimmed out and the resulting two contigs were joined at the
22
opposite ends with a string of 100 Ns. One fosmid (413009-K18) produced 2 contigs (16.8 kb
23
and 18.7 kb) with each matching either the forward or reverse end sequence. In some cases
24
limited coverage introduced sequencing errors interrupting open reading frames. Eleven of these
25
regions were identified and primers were designed targeting these regions for verification with
4
1
Sanger sequencing. Primers to these regions are provided in table S2. GenBank files contain the
2
Sanger-verified fosmid sequences.
3
4
Fragment recruitment of fosmid end sequences
5
Coverage plots relating fosmid end sequences from individual NESAP and SI fosmid end
6
libraries to large-insert DNA fragments were generated by using the Promer program
7
implemented in MUMmer 3.23 (Kurtz et al., 2004) using the following parameters as cited in
8
(Hallam et al., 2006): breaklength = 60, minimum cluster length = 20, and match length = 10.
9
Resulting delta files were converted into coordinate files using the show-coords program and
10
visualized in graphical format (coverage plot) by using the MUMmerplot program. Also using
11
the coordinate files, the number of fosmid end sequences recruited to each large insert DNA
12
fragment was calculated at 60% - 80% nucleotide similarity and at >80% nucleotide similarity,
13
ends recruiting to the 16S-23S rRNA region were subtracted, remaining ends were normalized to
14
total number of ends per library, and the normalized proportion of sequences in each library
15
recruited to each large-insert fragment was visualized using bubble.pl (available for download at:
16
http://hallam.microbiology.ubc.ca/downloads/index.html). The number of fosmid end sequences
17
recruited to the psr operon on fosmids FPPP_13C3 and 122006-I05 was also calculated and
18
visualized as described above.
19
20
5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
References
Allers, E., Wright, J.J., Konwar, K.M., Howes, C.G., Beneze, E., Hallam, S.J., and Sullivan,
M.B. (2012). Diversity and population structure of Marine Group A bacteria in the Northeast
subarctic Pacific Ocean. ISME J 7, 256-268.
Chevreux, B., Pfisterer, T., Drescher, B., Driesel, A.J., Müller, W.E., Wetter, T., and Suhai, S.
(2004). Using the miraEST assembler for reliable and automated mRNA transcript assembly and
SNP detection in sequenced ESTs. Genome Research 14, 1147-159.
DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K., Huber, T.,
Dalevi, D., Hu, P., and Andersen, G.L. (2006). Greengenes, a chimera-checked 16S rRNA gene
database and workbench compatible with ARB. Appl Environ Microbiol 72, 5069-072.
DeSantis, T.Z.J., Hugenholtz, P., Keller, K., Brodie, E.L., Larsen, N., Piceno, Y.M., Phan, R.,
and Andersen, G.L. (2006). NAST: a multiple sequence alignment server for comparative
analysis of 16S rRNA genes. Nucleic Acids Res 34, W394-99.
Guindon, S., Lethiec, F., Duroux, P., and Gascuel, O. (2005). PHYML Online--a web server for
fast maximum likelihood-based phylogenetic inference. Nucleic Acids Res 33, W557-59.
Hallam, S.J., Konstantinidis, K.T., Putnam, N., Schleper, C., Watanabe, Y., Sugahara, J.,
Preston, C., de la Torre, J., Richardson, P.M., and DeLong, E.F. (2006). Genomic analysis of the
uncultivated marine crenarchaeote Cenarchaeum symbiosum. Proc Natl Acad Sci U S A 103,
18296-8301.
Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., and Salzberg,
S.L. (2004). Versatile and open software for comparing large genomes. Genome Biol 5, R12.
Ludwig, W., Strunk, O., Westram, R., Richter, L., Meier, H., Yadhukumar, Buchner, A., Lai, T.,
Steppi, S., et al. (2004). ARB: a software environment for sequence data. Nucleic Acids
Research 32, 1363-371.
Pruesse, E., Quast, C., Knittel, K., Fuchs, B.M., Ludwig, W.G., Peplies, J., and Glockner, F.O.
(2007). SILVA: a comprehensive online resource for quality checked and aligned ribosomal
RNA sequence data compatible with ARB. Nucleic Acids Research 35, 7188-196.
Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister, E.B., Lesniewski,
R.A., Oakley, B.B., Parks, D.H., and Robinson, C.J. (2009). Introducing mothur: open-source,
platform-independent, community-supported software for describing and comparing microbial
communities. Appl Environ Microbiol 75, 7537-541.
Taupp, M., Lee, S., Hawley, A., Yang, J., and Hallam, S.J. (2009). Large insert environmental
genomic library production. J Vis Exp
6
1
2
3
Zaikova, E., Walsh, D.A., Stilwell, C.P., Mohn, W.W., Tortell, P.D., and Hallam, S.J. (2010).
Microbial community dynamics in a seasonally anoxic fjord: Saanich Inlet, British Columbia.
Environ Microbiol 12, 172-191.
7
Download