Supplementary Methods (doc 88K)

Supplemental Methods Sampling Surface water was collected from the Kranji Reservoir, Singapore, as grab samples off the Public Utilities Board dock (1°26.23’N, 103°44.17’E). The Kranji Reservoir is a shallow impoundment freshwater reservoir (mean depth 3.5 m) covering approximately 2.8 km2 and surrounded by a catchment with diverse land uses including residential and agricultural (Gin et al 2011). Samples were obtained from 10-20cm beneath a visible algal surface scum from January 14 to 15, 2010 at 5pm, 8pm, 6am, 8am, 1pm and 3pm. 205-350 ml for each sample was filtered onto a GP 0.22μm sterivex filter (Catalog # SVGP01050) and immediately preserved in 2.5ml of RNAlater, followed by immediate refrigeration. Samples were shipped from Singapore to Cambridge, MA on dry ice where they were stored at -80°C until RNA extraction was performed. Metadata The environmental parameters temperature, Total Kjehldahl Nitrogen, Total Phosphate, Dissolved Organic Carbon, microcystin concentration and light intensity were measured for each sample. Light intensity was measured with a hand-held unit (Ruby electronics DT2232). Samples for nutrient concentration measurements were taken prior to filtration for RNA preservation and were separately processed by Setsco Services Pte Ltd (Singapore) using the American Public Health Association Standard Method for the determination of water and wastewater (Eaton et al 2005). The microcystin toxin levels in the surface samples (without lysing cyanobacterial cells) were measured with QuantiTube™ Kit for microcystin (Catalog # ET039) which uses a competitive EnzymeLinked ImmunoSorbent Assay (ELISA). The calibration range for the kit is 0.4 - 2.5 ppb (parts per billion, ng/mL) and the Limit of Detection (LOD) is 0.18 ppb. RNA extraction and cDNA synthesis For each sample, the tip of the sterivex filter cartridge was cut open and the RNAlater was removed by pushing it through the filter with a syringe. The filter cartridge was then broken in a sterile Whirl-Pak plastic bag using a hammer to access the filter inside. Next, the filter was cut into pieces and put in 15 mg/ml lysozyme, vortexed and incubated at 37oC for 10 min. Sterile autoclaved beads were added to the tube which was bead beaten for 10min. The tubes were submerged in ice at 2 min intervals to keep cool. Tubes were spun down at 1,000 rpm for 1 min and the supernatant was collected to a new tube, which was centrifuged at 14,000 rpm for 3 min. To each pellet, 1 ml TRIzol was added, vortexed for 1 min until the pellet disappeared. Then, the tube was incubated at room temperature for 5 min. Next, 200μl chloroform was added to partition RNA into aqueous supernatant for separation, and then vortexed and incubated for 8 min. After centrifuging at 12,000x g for 15 min, aqueous phase was removed. 1μl glycogen (15g/l) was added (to increase precipitation efficiency from dilute RNA solutions) and then followed by 0.5 ml isopropanol to precipitate total RNA. After incubating at -80oC overnight, the tubes were centrifuged again at 14,000 rpm for 30 min. The pellets were washed with 70% ethanol, centrifuged at 12,000x g for 5 min and air-dried. Extracted total RNA pellets were resuspended in 50μl DEPC treated RNase-free water and stored at -80oC. To deplete eukaryotic mRNAs and to enrich for microbial mRNA, 100 g of total RNA was used. PolyA subtraction of the total RNAwas carried out using the Poly(A)PuristTM MAG Kit (Catalog # AM1922). To enrich for bacterial mRNAs and deplete eukaryotic RNA (18S, 28S rRNAs and polyadenylated mRNAs) from the polyA removed total RNA, the MICROBEnrichTM Kit (Ambion Part # AM1901) was used. Bacterial mRNA enrichment and bacterial rRNA (16S and 23S rRNAs) removal was carried out using the MICROBExpress Kit (Ambion Part # AM1905). The removal of tRNA and 5S rRNA was done using the MEGAclear kit (Ambion Part # AM1908). RNA remaining from the three-step subtraction above was precipitated at -80oC overnight in 0.1 volume 3 M sodium acetate, 0.02 volume glycogen (15g/l) and 3 volume ice cold 100% ethanol, with brief vortexing to mix. RNA was recovered by centrifugation at 13,000 rpm for 30 min. Supernatant was discarded and the pellet was twice washed with 70% ethanol (centrifuge at 13,000 rpm for 5 min). After washing, the supernatant was discarded and RNA pellets were air-dried for 5 min and suspended in 10μl of DEPC treated RNase-free water and stored at -80oC. To further remove any remaining 16S and 23S rRNA, the mRNAONLY Prokaryotic mRNA Isolation Kit (EPICENTRE Biotechnologies, Cat. Nos. MOP51024) was used to further enrich for prokaryotic mRNA that would become substantially free of rRNA. The subtracted RNA remaining from the above four kits was subjected to in vitro transcription (IVT)-mediated linear amplification using the MessageAmpII-Bacteria Kit (Ambion Part # AM1790). Amplified RNA was ethanol precipitated and re-suspended in DEPC-treated water. SuperScript Double-Stranded cDNA Synthesis Kit (Invitrogen, Catalog # 11917-020) and hexadecamers (Integrated DNA Technologies, Catalog # S1230S) were used to generate double stranded cDNA. Illumina sequencing cDNA (5 μg) was sheared using Adaptive Focused Acoustic technology on a Bioruptor (Diagenode, Inc.) to generate fragments of 350 bp in length. End-repair was done with Quick BluntingTM Kit (New England Biolabs, Catalog # E1201L) and the blunting reaction was purified using MinElute Reaction Cleanup Kit (Qiagen, Catalog # 28204). The Barcoded adaptors were ordered synthesized as ss DNA oligos (IDT) and were ligated to each other, prior to ligating to the sheared DNA fragment. The adaptor duplex was then ligated to the end-repaired “A”-tailed overhang ds cDNAs. Each barcoded adaptor duplex was Six unique Adaptor mixes were prepared by mixing 50 μl Adaptor PEx.1 (100 μM), 50 μl Adaptor PEx.2 (100 μM), 20 μl Oligo Hybridization 10X Buffer [500 mM NaCl, 10 mM Tris-Cl pH 8.0, 1 mM EDTA pH 8.0] and 80 μl dH2O. As each sample was uniquely barcoded with a distinct 6-nucleotide sequence, six pairs of adaptors were added to each reaction individually. The mixture was then purified using QIAquick PCR Purification Kit (Catalog # 28106) and eluted in 30 μl Elution Buffer. The ligated products were purified on a gel to remove such that unligated adaptors and self-ligated adaptors were removed. A size-range for paired end sequencing was selected by accounting consideringfor an 80 base pair (bp) adaptor and barcode (both ends together) and 200-250 bp RNA sequence. Thus each sample was excised from a 2% agarose gel at 350 bp and purified using NucleoSpin® Extract II Kit (Clontech, Catalog # 740609.50). The resulting DNA was PCR enriched using adaptor based primers to selectively enrich cDNA fragments that have adapter molecules on both ends as only these were able to attach to flow cells and generate clusters. Personnel from the MIT BioMicro Center carried out Illumina sequencing of the six libraries using a single lane of the GAII platform. Post-sequencing processing Using the short read toolbox (brianknaus.com/software/srtoolbox) Illumina reads from a single lane on a GAII machine were de-multiplexed based on the barcodes. The sorted reads were trimmed by removing bases following any base with a phred quality score below 33, then all duplicate reads and reads <20 bases were removed. Sequences were then screened for possible rRNA contamination by a blastn search against the large subunit and small subunit RNA databases from SILVA. Although adaptors were initially removed from the reads by the sequencing software, it is possible that short reads might still contain adaptor sequence at their ends. Thus unpaired short sequences with matches to adaptor sequences at their ends were removed. However, if a read was part of a pair they were run through a Perl script, which uses assembly information and adaptor sequence to aid in adaptor removal. Sequences containing long homopolymers were removed. Although a transcript sequence may have initially contained two read pairs after screening some pairs became singletons. For BLAST searches against NCBI’s nonredundant protein database members of paired reads were treated as separate. For rRNA analysis read pairs (called read 1 and 2) were analyzed separately and if one part of a pair matched rRNA the whole pair was considered rRNA. Bacterioplankton community taxonomy and functional assignments After rRNA sequences were removed, each set of sequences was used in a blastx (blastall 2.2.20) against NR, which was downloaded on August 2, 2012. Each dataset was composed of either the 1 or 2 reads. The command line for BLAST was: -m 8 -W 3 -e 20 -Q 11 -F m S. Results were then loaded into MEGAN version 4.7 (Huson et al 2007) with the parameters for the Lowest Common Ancestor assignment set as: a min support of 5, a min score of 40, top 10 percent, a win score of 0, using min complexity filter, a min complexity of 0.44 and the paired reads were loaded together. MEGAN was used to assign KEGG (Kanehisa and Goto 2000) and taxonomic classification to all sequences. All reads classified into the top four phyla were saved in MEGAN with their original BLAST results. Comparisons of KEGG classification among these selected reads were made as five KEGG level-1 categories (“metabolism”, “genetic information processing”, “environmental information processing”, “organismal systems” and “cellular processes”) were further examined. Subcategories for “metabolism” and “cellular processes” were also further compared. Under the KEGG level-one category of “environmental information and processing”, specific types of ATP Binding Cassette (ABC) transporters, two component systems and secretion systems were identified. For each specific type, the totals were summed for each of the top four phyla, but for ABC transporters, only operons with >20 reads from at least one phylum were examined. Ribosomal RNA analysis Individual files with SSU matches were loaded into MG-RAST on February 1, 2013 (Meyer et al 2008). The results from “best-hit” matches against the SSU database were used to provide a taxonomic composition of each sample. The parameters were set such that threshold was an e-value cutoff of 1e-20, at >60% sequence identity and >50 bp alignment. Calculation of RPKM values for a M. aeruginosa pan-genome The abundance of M. aeruginosa transcripts in different Kranji Reservoir samples were compared by constructing an RPKM (reads per kilobase of exon per million mapped reads) matrix (Mortazavi et al 2008) in CLC Genomics Workbench (CLC, Denmark). The publicly available M. aeruginosa genome sequences NIES-843 (accession NC_010296) and PCC7806 (accession AM 778843- 778958) were used as reference sequences to calculate RPKM values from ortholog and paralog groups (next section). To optimize recruitment of reads to the reference genomes from M. aeruginosa, but not from other Cyanobacteria, simulated target and non-target metatranscriptomes were generated using MetaSim (Richter et al 2008) and mapped onto the M. aeruginosa reference genomes using the RNA-seq tool in CLC Genomics Workbench. A percent identity threshold that maximized recruitment of simulated transcripts from target Microcystis genomes while minimizing recruitment from non-target Nostoc punctiformes PCC73102 and Synechocystis PCC6803 genomes was selected for recruitment of reads from Kranji Reservoir (i.e. >90% nucleotide identity over >90% of the read length). A Microcystis pan-genome consisting of core and strain-specific (i.e. flexible) genes was created for construction of the RPKM matrix for normalized transcript abundance. M. aeruginosa genome sequences from strains NIES-843 (Kaneko et al 2007) and PCC7806 (Frangeul et al 2008) were compared to identify a set of core genes found in both genomes and a set of flexible genes that are unique to each genome. The orthologs from the two genomes were determined via the Reciprocal Smallest Distance (RSD) algorithm using an e-value cutoff at <1e-5 and a minimum alignment of 80% of total length (Wall et al 2003). For genes with multiple RSD matches (i.e. genes with identical smallest distances), groups were formed, which are important for calculations of RPKM values to be explained later. All orthologs were considered as the core genes of M. aeruginosa while genes without orthologs were considered part of the flexible genome. For orthologs, the RPKM values for the two genes were summed, for ortholog pairs with identical paralogs (i.e. genes with identical smallest distances) within a genome, the sum of the RPKM values for all homologs were determined. For the final RPKM matrix, the RPKM values for the 1 and 2 reads were considered technical replicates and averaged for each time point. Statistical analysis for differential transcript and gene set abundance The expression matrix of the Microcystis pan-genome was analyzed within the MultiExperiment Viewer (MEV) (Saeed et al 2006). Expression values were log2 transformed before normalization. Principal Component Analysis (PCA) and Hierarchical Clustering (HCl) of Pearson correlations then clustered samples. Both PCA and HCl grouped the samples according to whether they were collected during the daytime or during the evening/night, thus a T-test and Significance Analysis for Microarrays (SAM) were conducted to test the null hypothesis of no difference in gene expression with stage of the diel cycle. Genes corresponding to nitrogen or phosphorous metabolism in the M. aeruginosa genome were selected as defined in (Harke, et al 2013) and RPKM values for different samples were clustered based on Spearman Rank correlation. Gene Set Enrichment Analysis (GSEA) Enrichment of transcripts in level 3 KEGG pathways under groupings emergent from PCA and HCL (i.e. “day” or “night” collection times) were determined by Gene Set Enrichment Analysis (GSEA) implemented through the online Broad Server (Subramanian et al 2005). Input files were the M. aeruginosa NIES-843 and PCC7806 pan-genome RPKM expression dataset; the gene set database where gene sets were defined as sets of KEGG Ortholog ids (KO ids) falling within the same KEGG pathways (KEGG release 65.0), or sets of secondary metabolite gene clusters identified previously (Frangeul et al 2008) or if not previously identified then by the presence of polyketide synthase (PKS) or non-ribosomal peptide synthase (NRPS) genes; and a sample ID file defining the day and night partition of the six samples. The initial gene set contained 347 level 3 KEGG pathways and five secondary metabolite gene clusters, in which individual genes may be present in more than one pathway. Genes within the RPKM matrix were assigned KO ids by using the KEGG online tool KEGG Automatic Annotation Server and by implementing single direction best blastx hit to the default gene-set representatives from KEGG and the M. aeruginosa NIES-843 genome. The bit-score for a hit was set at 60, which is the default for KEGG annotation. For genes with the same KO id the RPKM values were summed. Expression values were normalized in GSEA. The parameters in GSEA were set such that the minimum gene set that was included in the analysis was five genes and the gene sets were permuted. Enrichment of transcripts in each of the top four phyla relative to the other three top phyla was examined the same way as M. aeruginosa except KO quantities were based on absolute read counts. Secondary metabolite genes Evidence for PKS and NRPS expression was first determined by looking for reads with BLAST matches to KS (KetoSynthase) and C (Condensation) domains. A blastx, with a bit-score cutoff set at >40, of each set of sequences with the 1 and 2 reads separated was used against four databases either NR-KS, NR-Condensation, Reference-KS or Reference-Condensation domains. NR-KS is composed of all KS domains identified via an hmm search of the NR protein database, NR-C is the same thing except it is composed of condensation domains (Ziemert et al 2012), Reference-KS and Reference-C are composed of a manually curated set of the respective domains (Ziemert et al 2012). The number of reads (transcripts) that matched to each unique NR-KS or NR-C domain was calculated. All top NR-KS and NR-C BLAST matches from the NR-KS or NR-C database were submitted to the online tool NaPDoS for classification using default settings (Ziemert et al 2012). Based on the classification of the top BLAST match in NaPDoS all NR-KS sequences were classified as either FAS or not FAS. The NR-KS sequences that were not considered as FAS based on top BLAST hit were used to construct a phylogeny along with the NaPDoS reference sequences. The nearest NaPDoS reference sequences in the phylogeny of NR-C domain was used to classify sequences into C domain subgroup clades as defined in NaPDoS. Each phylogeny was constructed in NaPDoS, which aligns all query sequences to a manually curated alignment and produces a phylogeny using FastTree. For each set of NR- KS or C domains the NCBI taxonomy for each sequence from which the NR-KS or C domains were derived was retrieved using the GI number. There is a possibility that the NR-KS or C sequences were derived from either environmental or PCR sequence data thus little or no information regarding the biosynthesis beyond the KS or C domain sequence can be obtained. However, a domain derived from a larger sequence available in NCBI provides further insight into the possible products produced. Therefore all NR-KS and NR-C sequences were examined to determine if they are part of a larger gene cluster deposited in genbank, this helped infer if KS and C domains were expressed from an entire PKS operon and if there is a possibility to make a more accurate prediction about the type of biosynthetic product. RT-qPCR for transcript abundance Microcystis transcripts for recA, grpE, psbA, mcyA, mcyB and prx were quantified by RTqPCR using primers and amplification conditions as described by (Shao, et al. 2009). RT-qPCR was performed using the LightCycler® 480 Real-Time PCR system and software v. 1.5.0 (Roche Applied Sciences, Indianapolis, IN) for calculation of crossing point (Cp) values and melting temperature (Tm) analysis. RT-qPCR reaction mixtures consisted of 7.5µl of Roche 2x Faststart universal SYBR green mix (Catalog #04913914001),25 µM of each primer, and 100 ng of amplified RNA template in 15ul. Standard curves were constructed with 10-fold serial dilutions of PCR amplicons from toxigenic Microcystis type strain LB2385 obtained from The Culture Collection of Algae at the University of Texas at Austin. The amplification efficiency (E) for each RT-qPCR run was calculated from the slope of the standard curve and was consistently in the range of 99-100%. RT-qPCR values before and after normalization to housekeeping gene grpE were compared to RPKM values by Pearson correlation in JMP pro v.10. Eaton AD, Franson MAH, Association APH, Association AWW, Federation WE (2005). Standard Methods for the Examination of Water and Wastewater. American Public Health Association. Frangeul L, Quillardet P, Castets A-M, Humbert J-F, Matthijs H, Cortez D et al (2008). Highly plastic genome of Microcystis aeruginosa PCC 7806, a ubiquitous toxic freshwater cyanobacterium. BMC Genomics 9: 274. Huson DH, Auch AF, Qi J, Schuster SC (2007). MEGAN analysis of metagenomic data. Genome Research 17: 377-386. Kanehisa M, Goto S (2000). KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research 28: 27-30. Kaneko T, Nakajima N, Okamoto S, Suzuki I, Tanabe Y, Tamaoki M et al (2007). Complete Genomic Structure of the Bloom-forming Toxic Cyanobacterium Microcystis aeruginosa NIES-843. DNA Research 14: 247-256. Meyer F, Paarmann D, D'Souza M, Olson R, Glass E, Kubal M et al (2008). The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9: 386. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008). Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Meth 5: 621-628. Richter DC, Ott F, Auch AF, Schmid R, Huson DH (2008). MetaSim—A Sequencing Simulator for Genomics and Metagenomics. PLoS ONE 3: e3373. Saeed AI, Bhagabati NK, Braisted JC, Liang W, Sharov V, Howe EA et al (2006). [9] TM4 Microarray Software Suite. In: Alan K, Brian O (eds). Methods in Enzymology. Academic Press. pp 134-193. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA et al (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102: 15545-15550. Wall DP, Fraser HB, Hirsh AE (2003). Detecting putative orthologs. Bioinformatics 19: 1710-1711. Ziemert N, Podell S, Penn K, Badger JH, Allen E, Jensen PR (2012). The Natural Product Domain Seeker NaPDoS: A Phylogeny Based Bioinformatic Tool to Classify Secondary Metabolite Gene Diversity. PLoS ONE 7: e34064.

Supplementary Methods (doc 88K)

Related documents

Products

Support

Supplementary Methods (doc 88K)

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib