Supplementary Methods (doc 88K)

advertisement
Supplemental Methods
Sampling
Surface water was collected from the Kranji Reservoir, Singapore, as grab samples off
the Public Utilities Board dock (1°26.23’N, 103°44.17’E). The Kranji Reservoir is a
shallow impoundment freshwater reservoir (mean depth 3.5 m) covering approximately
2.8 km2 and surrounded by a catchment with diverse land uses including residential and
agricultural (Gin et al 2011). Samples were obtained from 10-20cm beneath a visible
algal surface scum from January 14 to 15, 2010 at 5pm, 8pm, 6am, 8am, 1pm and 3pm.
205-350 ml for each sample was filtered onto a GP 0.22μm sterivex filter (Catalog #
SVGP01050) and immediately preserved in 2.5ml of RNAlater, followed by immediate
refrigeration. Samples were shipped from Singapore to Cambridge, MA on dry ice where
they were stored at -80°C until RNA extraction was performed.
Metadata
The environmental parameters temperature, Total Kjehldahl Nitrogen, Total Phosphate,
Dissolved Organic Carbon, microcystin concentration and light intensity were measured
for each sample. Light intensity was measured with a hand-held unit (Ruby electronics
DT2232). Samples for nutrient concentration measurements were taken prior to filtration
for RNA preservation and were separately processed by Setsco Services Pte Ltd
(Singapore) using the American Public Health Association Standard Method for the
determination of water and wastewater (Eaton et al 2005). The microcystin toxin levels
in the surface samples (without lysing cyanobacterial cells) were measured with
QuantiTube™ Kit for microcystin (Catalog # ET039) which uses a competitive EnzymeLinked ImmunoSorbent Assay (ELISA). The calibration range for the kit is 0.4 - 2.5 ppb
(parts per billion, ng/mL) and the Limit of Detection (LOD) is 0.18 ppb.
RNA extraction and cDNA synthesis
For each sample, the tip of the sterivex filter cartridge was cut open and the RNAlater
was removed by pushing it through the filter with a syringe. The filter cartridge was then
broken in a sterile Whirl-Pak plastic bag using a hammer to access the filter inside. Next,
the filter was cut into pieces and put in 15 mg/ml lysozyme, vortexed and incubated at
37oC for 10 min. Sterile autoclaved beads were added to the tube which was bead beaten
for 10min. The tubes were submerged in ice at 2 min intervals to keep cool. Tubes were
spun down at 1,000 rpm for 1 min and the supernatant was collected to a new tube, which
was centrifuged at 14,000 rpm for 3 min. To each pellet, 1 ml TRIzol was added,
vortexed for 1 min until the pellet disappeared. Then, the tube was incubated at room
temperature for 5 min. Next, 200μl chloroform was added to partition RNA into aqueous
supernatant for separation, and then vortexed and incubated for 8 min. After centrifuging
at 12,000x g for 15 min, aqueous phase was removed. 1μl glycogen (15g/l) was added
(to increase precipitation efficiency from dilute RNA solutions) and then followed by 0.5
ml isopropanol to precipitate total RNA. After incubating at -80oC overnight, the tubes
were centrifuged again at 14,000 rpm for 30 min. The pellets were washed with 70%
ethanol, centrifuged at 12,000x g for 5 min and air-dried. Extracted total RNA pellets
were resuspended in 50μl DEPC treated RNase-free water and stored at -80oC. To
deplete eukaryotic mRNAs and to enrich for microbial mRNA, 100 g of total RNA was
used. PolyA subtraction of the total RNAwas carried out using the Poly(A)PuristTM
MAG Kit (Catalog # AM1922).
To enrich for bacterial mRNAs and deplete eukaryotic RNA (18S, 28S rRNAs and
polyadenylated mRNAs) from the polyA removed total RNA, the MICROBEnrichTM
Kit (Ambion Part # AM1901) was used. Bacterial mRNA enrichment and bacterial
rRNA (16S and 23S rRNAs) removal was carried out using the MICROBExpress Kit
(Ambion Part # AM1905). The removal of tRNA and 5S rRNA was done using the
MEGAclear kit (Ambion Part # AM1908). RNA remaining from the three-step
subtraction above was precipitated at -80oC overnight in 0.1 volume 3 M sodium acetate,
0.02 volume glycogen (15g/l) and 3 volume ice cold 100% ethanol, with brief
vortexing to mix. RNA was recovered by centrifugation at 13,000 rpm for 30 min.
Supernatant was discarded and the pellet was twice washed with 70% ethanol (centrifuge
at 13,000 rpm for 5 min). After washing, the supernatant was discarded and RNA pellets
were air-dried for 5 min and suspended in 10μl of DEPC treated RNase-free water and
stored at -80oC. To further remove any remaining 16S and 23S rRNA, the mRNAONLY Prokaryotic mRNA Isolation Kit (EPICENTRE Biotechnologies, Cat. Nos.
MOP51024) was used to further enrich for prokaryotic mRNA that would become
substantially free of rRNA. The subtracted RNA remaining from the above four kits was
subjected to in vitro transcription (IVT)-mediated linear amplification using the
MessageAmpII-Bacteria Kit (Ambion Part # AM1790). Amplified RNA was ethanol
precipitated and re-suspended in DEPC-treated water. SuperScript Double-Stranded
cDNA Synthesis Kit (Invitrogen, Catalog # 11917-020) and hexadecamers (Integrated
DNA Technologies, Catalog # S1230S) were used to generate double stranded cDNA.
Illumina sequencing
cDNA (5 μg) was sheared using Adaptive Focused Acoustic technology on a
Bioruptor (Diagenode, Inc.) to generate fragments of 350 bp in length. End-repair was
done with Quick BluntingTM Kit (New England Biolabs, Catalog # E1201L) and the
blunting reaction was purified using MinElute Reaction Cleanup Kit (Qiagen, Catalog #
28204). The Barcoded adaptors were ordered synthesized as ss DNA oligos (IDT) and
were ligated to each other, prior to ligating to the sheared DNA fragment. The adaptor
duplex was then ligated to the end-repaired “A”-tailed overhang ds cDNAs. Each
barcoded adaptor duplex was Six unique Adaptor mixes were prepared by mixing 50 μl
Adaptor PEx.1 (100 μM), 50 μl Adaptor PEx.2 (100 μM), 20 μl Oligo Hybridization 10X
Buffer [500 mM NaCl, 10 mM Tris-Cl pH 8.0, 1 mM EDTA pH 8.0] and 80 μl dH2O. As
each sample was uniquely barcoded with a distinct 6-nucleotide sequence, six pairs of
adaptors were added to each reaction individually. The mixture was then purified using
QIAquick PCR Purification Kit (Catalog # 28106) and eluted in 30 μl Elution Buffer.
The ligated products were purified on a gel to remove such that unligated adaptors and
self-ligated adaptors were removed. A size-range for paired end sequencing was selected
by accounting consideringfor an 80 base pair (bp) adaptor and barcode (both ends
together) and 200-250 bp RNA sequence. Thus each sample was excised from a 2%
agarose gel at 350 bp and purified using NucleoSpin® Extract II Kit (Clontech, Catalog #
740609.50). The resulting DNA was PCR enriched using adaptor based primers to
selectively enrich cDNA fragments that have adapter molecules on both ends as only
these were able to attach to flow cells and generate clusters. Personnel from the MIT
BioMicro Center carried out Illumina sequencing of the six libraries using a single lane of
the GAII platform.
Post-sequencing processing
Using the short read toolbox (brianknaus.com/software/srtoolbox) Illumina reads from a
single lane on a GAII machine were de-multiplexed based on the barcodes. The sorted
reads were trimmed by removing bases following any base with a phred quality score
below 33, then all duplicate reads and reads <20 bases were removed. Sequences were
then screened for possible rRNA contamination by a blastn search against the large
subunit and small subunit RNA databases from SILVA. Although adaptors were initially
removed from the reads by the sequencing software, it is possible that short reads might
still contain adaptor sequence at their ends. Thus unpaired short sequences with matches
to adaptor sequences at their ends were removed. However, if a read was part of a pair
they were run through a Perl script, which uses assembly information and adaptor
sequence to aid in adaptor removal. Sequences containing long homopolymers were
removed. Although a transcript sequence may have initially contained two read pairs
after screening some pairs became singletons. For BLAST searches against NCBI’s nonredundant protein database members of paired reads were treated as separate. For rRNA
analysis read pairs (called read 1 and 2) were analyzed separately and if one part of a pair
matched rRNA the whole pair was considered rRNA.
Bacterioplankton community taxonomy and functional assignments
After rRNA sequences were removed, each set of sequences was used in a blastx (blastall
2.2.20) against NR, which was downloaded on August 2, 2012. Each dataset was
composed of either the 1 or 2 reads. The command line for BLAST was: -m 8 -W 3 -e 20
-Q 11 -F m S. Results were then loaded into MEGAN version 4.7 (Huson et al 2007)
with the parameters for the Lowest Common Ancestor assignment set as: a min support
of 5, a min score of 40, top 10 percent, a win score of 0, using min complexity filter, a
min complexity of 0.44 and the paired reads were loaded together. MEGAN was used to
assign KEGG (Kanehisa and Goto 2000) and taxonomic classification to all sequences.
All reads classified into the top four phyla were saved in MEGAN with their original
BLAST results. Comparisons of KEGG classification among these selected reads were
made as five KEGG level-1 categories (“metabolism”, “genetic information processing”,
“environmental information processing”, “organismal systems” and “cellular processes”)
were further examined. Subcategories for “metabolism” and “cellular processes” were
also further compared. Under the KEGG level-one category of “environmental
information and processing”, specific types of ATP Binding Cassette (ABC) transporters,
two component systems and secretion systems were identified. For each specific type,
the totals were summed for each of the top four phyla, but for ABC transporters, only
operons with >20 reads from at least one phylum were examined.
Ribosomal RNA analysis
Individual files with SSU matches were loaded into MG-RAST on February 1, 2013
(Meyer et al 2008). The results from “best-hit” matches against the SSU database were
used to provide a taxonomic composition of each sample. The parameters were set such
that threshold was an e-value cutoff of 1e-20, at >60% sequence identity and >50 bp
alignment.
Calculation of RPKM values for a M. aeruginosa pan-genome
The abundance of M. aeruginosa transcripts in different Kranji Reservoir samples were
compared by constructing an RPKM (reads per kilobase of exon per million mapped
reads) matrix (Mortazavi et al 2008) in CLC Genomics Workbench (CLC, Denmark).
The publicly available M. aeruginosa genome sequences NIES-843 (accession
NC_010296) and PCC7806 (accession AM 778843- 778958) were used as reference
sequences to calculate RPKM values from ortholog and paralog groups (next section).
To optimize recruitment of reads to the reference genomes from M. aeruginosa, but not
from other Cyanobacteria, simulated target and non-target metatranscriptomes were
generated using MetaSim (Richter et al 2008) and mapped onto the M. aeruginosa
reference genomes using the RNA-seq tool in CLC Genomics Workbench. A percent
identity threshold that maximized recruitment of simulated transcripts from target
Microcystis genomes while minimizing recruitment from non-target Nostoc punctiformes
PCC73102 and Synechocystis PCC6803 genomes was selected for recruitment of reads
from Kranji Reservoir (i.e. >90% nucleotide identity over >90% of the read length).
A Microcystis pan-genome consisting of core and strain-specific (i.e. flexible) genes was
created for construction of the RPKM matrix for normalized transcript abundance. M.
aeruginosa genome sequences from strains NIES-843 (Kaneko et al 2007) and PCC7806
(Frangeul et al 2008) were compared to identify a set of core genes found in both
genomes and a set of flexible genes that are unique to each genome. The orthologs from
the two genomes were determined via the Reciprocal Smallest Distance (RSD) algorithm
using an e-value cutoff at <1e-5 and a minimum alignment of 80% of total length (Wall
et al 2003). For genes with multiple RSD matches (i.e. genes with identical smallest
distances), groups were formed, which are important for calculations of RPKM values to
be explained later. All orthologs were considered as the core genes of M. aeruginosa
while genes without orthologs were considered part of the flexible genome. For
orthologs, the RPKM values for the two genes were summed, for ortholog pairs with
identical paralogs (i.e. genes with identical smallest distances) within a genome, the sum
of the RPKM values for all homologs were determined. For the final RPKM matrix, the
RPKM values for the 1 and 2 reads were considered technical replicates and averaged for
each time point.
Statistical analysis for differential transcript and gene set abundance
The expression matrix of the Microcystis pan-genome was analyzed within the
MultiExperiment Viewer (MEV) (Saeed et al 2006). Expression values were log2
transformed before normalization. Principal Component Analysis (PCA) and
Hierarchical Clustering (HCl) of Pearson correlations then clustered samples. Both PCA
and HCl grouped the samples according to whether they were collected during the
daytime or during the evening/night, thus a T-test and Significance Analysis for
Microarrays (SAM) were conducted to test the null hypothesis of no difference in gene
expression with stage of the diel cycle. Genes corresponding to nitrogen or phosphorous
metabolism in the M. aeruginosa genome were selected as defined in (Harke, et al 2013)
and RPKM values for different samples were clustered based on Spearman Rank
correlation.
Gene Set Enrichment Analysis (GSEA)
Enrichment of transcripts in level 3 KEGG pathways under groupings emergent from
PCA and HCL (i.e. “day” or “night” collection times) were determined by Gene Set
Enrichment Analysis (GSEA) implemented through the online Broad Server
(Subramanian et al 2005). Input files were the M. aeruginosa NIES-843 and PCC7806
pan-genome RPKM expression dataset; the gene set database where gene sets were
defined as sets of KEGG Ortholog ids (KO ids) falling within the same KEGG pathways
(KEGG release 65.0), or sets of secondary metabolite gene clusters identified previously
(Frangeul et al 2008) or if not previously identified then by the presence of polyketide
synthase (PKS) or non-ribosomal peptide synthase (NRPS) genes; and a sample ID file
defining the day and night partition of the six samples. The initial gene set contained 347
level 3 KEGG pathways and five secondary metabolite gene clusters, in which individual
genes may be present in more than one pathway. Genes within the RPKM matrix were
assigned KO ids by using the KEGG online tool KEGG Automatic Annotation Server
and by implementing single direction best blastx hit to the default gene-set
representatives from KEGG and the M. aeruginosa NIES-843 genome. The bit-score for
a hit was set at 60, which is the default for KEGG annotation. For genes with the same
KO id the RPKM values were summed. Expression values were normalized in GSEA.
The parameters in GSEA were set such that the minimum gene set that was included in
the analysis was five genes and the gene sets were permuted. Enrichment of transcripts
in each of the top four phyla relative to the other three top phyla was examined the same
way as M. aeruginosa except KO quantities were based on absolute read counts.
Secondary metabolite genes
Evidence for PKS and NRPS expression was first determined by looking for reads with
BLAST matches to KS (KetoSynthase) and C (Condensation) domains. A blastx, with a
bit-score cutoff set at >40, of each set of sequences with the 1 and 2 reads separated was
used against four databases either NR-KS, NR-Condensation, Reference-KS or
Reference-Condensation domains. NR-KS is composed of all KS domains identified via
an hmm search of the NR protein database, NR-C is the same thing except it is composed
of condensation domains (Ziemert et al 2012), Reference-KS and Reference-C are
composed of a manually curated set of the respective domains (Ziemert et al 2012). The
number of reads (transcripts) that matched to each unique NR-KS or NR-C domain was
calculated. All top NR-KS and NR-C BLAST matches from the NR-KS or NR-C
database were submitted to the online tool NaPDoS for classification using default
settings (Ziemert et al 2012). Based on the classification of the top BLAST match in
NaPDoS all NR-KS sequences were classified as either FAS or not FAS. The NR-KS
sequences that were not considered as FAS based on top BLAST hit were used to
construct a phylogeny along with the NaPDoS reference sequences. The nearest NaPDoS
reference sequences in the phylogeny of NR-C domain was used to classify sequences
into C domain subgroup clades as defined in NaPDoS. Each phylogeny was constructed
in NaPDoS, which aligns all query sequences to a manually curated alignment and
produces a phylogeny using FastTree. For each set of NR- KS or C domains the NCBI
taxonomy for each sequence from which the NR-KS or C domains were derived was
retrieved using the GI number.
There is a possibility that the NR-KS or C sequences were derived from either
environmental or PCR sequence data thus little or no information regarding the
biosynthesis beyond the KS or C domain sequence can be obtained. However, a domain
derived from a larger sequence available in NCBI provides further insight into the
possible products produced. Therefore all NR-KS and NR-C sequences were examined
to determine if they are part of a larger gene cluster deposited in genbank, this helped
infer if KS and C domains were expressed from an entire PKS operon and if there is a
possibility to make a more accurate prediction about the type of biosynthetic product.
RT-qPCR for transcript abundance
Microcystis transcripts for recA, grpE, psbA, mcyA, mcyB and prx were quantified by RTqPCR using primers and amplification conditions as described by (Shao, et al. 2009).
RT-qPCR was performed using the LightCycler® 480 Real-Time PCR system and
software v. 1.5.0 (Roche Applied Sciences, Indianapolis, IN) for calculation of crossing
point (Cp) values and melting temperature (Tm) analysis. RT-qPCR reaction mixtures
consisted of 7.5µl of Roche 2x Faststart universal SYBR green mix (Catalog
#04913914001),25 µM of each primer, and 100 ng of amplified RNA template in 15ul.
Standard curves were constructed with 10-fold serial dilutions of PCR amplicons from
toxigenic Microcystis type strain LB2385 obtained from The Culture Collection of Algae
at the University of Texas at Austin. The amplification efficiency (E) for each RT-qPCR
run was calculated from the slope of the standard curve and was consistently in the range
of 99-100%. RT-qPCR values before and after normalization to housekeeping gene grpE
were compared to RPKM values by Pearson correlation in JMP pro v.10.
Eaton AD, Franson MAH, Association APH, Association AWW, Federation WE (2005).
Standard Methods for the Examination of Water and Wastewater. American Public
Health Association.
Frangeul L, Quillardet P, Castets A-M, Humbert J-F, Matthijs H, Cortez D et al (2008).
Highly plastic genome of Microcystis aeruginosa PCC 7806, a ubiquitous toxic
freshwater cyanobacterium. BMC Genomics 9: 274.
Huson DH, Auch AF, Qi J, Schuster SC (2007). MEGAN analysis of metagenomic data.
Genome Research 17: 377-386.
Kanehisa M, Goto S (2000). KEGG: Kyoto Encyclopedia of Genes and Genomes.
Nucleic Acids Research 28: 27-30.
Kaneko T, Nakajima N, Okamoto S, Suzuki I, Tanabe Y, Tamaoki M et al (2007).
Complete Genomic Structure of the Bloom-forming Toxic Cyanobacterium Microcystis
aeruginosa NIES-843. DNA Research 14: 247-256.
Meyer F, Paarmann D, D'Souza M, Olson R, Glass E, Kubal M et al (2008). The
metagenomics RAST server - a public resource for the automatic phylogenetic and
functional analysis of metagenomes. BMC Bioinformatics 9: 386.
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008). Mapping and
quantifying mammalian transcriptomes by RNA-Seq. Nat Meth 5: 621-628.
Richter DC, Ott F, Auch AF, Schmid R, Huson DH (2008). MetaSim—A Sequencing
Simulator for Genomics and Metagenomics. PLoS ONE 3: e3373.
Saeed AI, Bhagabati NK, Braisted JC, Liang W, Sharov V, Howe EA et al (2006). [9]
TM4 Microarray Software Suite. In: Alan K, Brian O (eds). Methods in Enzymology.
Academic Press. pp 134-193.
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA et al
(2005). Gene set enrichment analysis: A knowledge-based approach for interpreting
genome-wide expression profiles. Proceedings of the National Academy of Sciences of
the United States of America 102: 15545-15550.
Wall DP, Fraser HB, Hirsh AE (2003). Detecting putative orthologs. Bioinformatics 19:
1710-1711.
Ziemert N, Podell S, Penn K, Badger JH, Allen E, Jensen PR (2012). The Natural
Product Domain Seeker NaPDoS: A Phylogeny Based Bioinformatic Tool to Classify
Secondary Metabolite Gene Diversity. PLoS ONE 7: e34064.
Download