Advancing Science with DNA Sequence AUG AUG AUG AUG AUG AUG AUG Metatranscriptomics: Challenges and Progress Shaomei He DOE Joint Genome Institute Advancing Science with DNA Sequence Metatranscriptomics Metatranscriptome The complete collection of transcribed sequences in a microbial community: Protein-coding RNA (mRNA) Non-coding RNA (rRNA, tRNA, regulatory RNA, etc) Metatranscriptomics studies: Community functions Response to different environments Regulation of gene expression Advancing Science with DNA Sequence Evolving of Metatranscriptomics cDNA clone libraries + Sanger sequencing Microarrays RNA-seq enabled by next-generation sequencing technologies. Sorek & Cossart, NRG (2010) 11, 9-16 RNA-seq is superior to microarrays in many ways in microbial community gene expression analysis. Advancing Science with DNA Sequence Challenges in Metatranscriptomics Wet lab Low RNA yield from environmental samples Instability of RNA (half-lives on the order of minutes) High rRNA content in total RNA (mRNA accounts for 1-5% of total RNA) http://www.nwfsc.noaa.gov/index.cfm Bioinformatics General challenges with short reads and large data size Small overlap between metagenome and metatranscriptome, or complete lack of metagenome reference http://cybernetnews.com/vista-recovery-disc/ Advancing Science with DNA Sequence rRNA Removal Methods Method rRNA feature used Input Manipulate RNA raw RNA Before cDNA synthesis Subtractive hybridization RNase H digestion Conserved sequence High Exonuclease digestion 5’ monophosphate Gel extraction Size Biased poly(A) tailing 2o structure Low Sequence feature Low No High abundance Low No Yes During cDNA synthesis Not-so-random primers After cDNA synthesis Library normalization w/ DSN Advancing Science with DNA Sequence Validation of Two Ribosomal RNA Removal Methods for Microbial Metatranscriptomics Shaomei He, Omri Wurtzel, Kanwar Singh, Jeff L. Froula, Suzan Yilmaz, Susannah G. Tringe, Zhong Wang, Feng Chen, Erika A. Lindquist, Rotem Sorek and Philip Hugenholtz Advancing Science with DNA Sequence Subtractive Hybridization & Exonuclease Digestion Subtractive Hybridization Exonuclease Digestion MICROBExpress Bacterial mRNA Enrichment (Ambion) mRNA-ONLY Prokaryotic mRNA Isolation (Epicentre) mRNA 5’ PPP mRNA rRNA 5’ P rRNA Capture Oligo Magnetic Bead Hyb 5’ Monophosphate Dependent Exonuclease Exo Advancing Science with DNA Sequence Objectives Validate the performance of Hyb and Exo kits on synthetic five-member microbial communities, using Illumina sequencing to evaluate: Efficiency of rRNA removal Fidelity of mRNA relative transcript abundance Treatments: Hyb 2 x Hyb Exo Hyb + Exo Exo + Hyb Advancing Science with DNA Sequence Microbial Isolates in the Two Synthetic Communities Genome size (Mbp) %GC Phylum Match Hyb target sites Desulfovibrio vulgaris 3.7 63 Proteobacteria Yes Streptomyces sp. 8-10 71 Actinobacteria Yes Lactococcus lactis 2.53 35 Firmicutes Yes Spirochaeta aurantia 4.3 65 Spirochaeta Yes Lactobacillus brevis 2.3 46 Firmicutes Yes Kangiella koreensis 2.9 43 Proteobacteria Yes Catenulispora acidiphila 10.5 70 Actinobacteria Yes Halorhabdus utahensis 3.1 63 Euryarchaeota No Organism Community 1 Community 2 Advancing Science with DNA Sequence Technical Reproducibility Exo Exo, rep2 Hyb, rep2 Hyb Hyb, rep1 Exo, rep1 All treatments exhibited good technical reproducibility. Advancing Science with DNA Sequence rRNA Removal Efficiency Advancing Science with DNA Sequence Read Distribution Community 1 Community 2 Advancing Science with DNA Sequence Observed and Actual rRNA Removal 97% Before removal rRNA mRNA 97 3 - 80 -0 rRNA 85% After removal 17 3 rRNA Observed rRNA reduction = 97% - 85% = 12% Actual percent removal = 80/97 = 82.5% Actual removal is much higher than what appears, due to the very high original rRNA content. Advancing Science with DNA Sequence rRNA Removal (%) Community rRNA Removal Community 1: Hyb + Exo > Hyb > Exo Community 2: Hyb + Exo > Exo + Hyb > Exo > 2 x Hyb ≈ Hyb Advancing Science with DNA Sequence rRNA Removal (%) rRNA Removal and RNA Integrity Hyb 2 x Hyb 120 Exo 120 r = 0.946 100 120 r = 0.958 110 100 80 60 90 60 40 80 40 20 70 20 5 6 7 8 9 10 11 5 6 7 8 r = 0.874 9 10 11 120 r = 0.945 100 110 100 90 80 80 60 0 60 RIN: RNA integrity number Exo + Hyb 120 100 80 0 Hyb + Exo 70 60 40 5 6 7 8 9 10 11 5 6 7 8 9 10 11 5 6 RNA Integrity Number (RIN) More intact RNA Higher rRNA removal efficiency 7 8 9 10 11 Advancing Science with DNA Sequence Enrichment of mRNA & Increase of Detection Sensitivity Advancing Science with DNA Sequence Fidelity of mRNA Relative Abundance Advancing Science with DNA Sequence Fidelity of mRNA Relative Abundance Community 1 Hyb > Exo > Hyb+Exo Community 2 Hyb ≈ 2xHyb > Exo > Hyb+Exo ≈ Exo+Hyb Advancing Science with DNA Sequence Conclusions rRNA removal efficiency was community composition and RNA integrity dependent. Exo degraded some mRNA, introducing larger variation than Hyb. Combining Hyb and Exo provided higher rRNA removal than used alone, but the fidelity was significantly compromised. Advancing Science with DNA Sequence Customized subtractive hybridization Stewart et al, ISME J (2010) 4, 896–907 Customized probes specific to communities of interest Probes cover near-full-length rRNA, and should also capture partially degraded (fragmented) rRNA It has been applied on marine metatranscriptome samples to substantially reduce rRNA. Advancing Science with DNA Sequence Duplex-specific nuclease (DSN) Yi et al, Nucleic Acids Res (2011) doi: 10.1093/nar/gkr617 Total RNA RNA-seq library construction Denature ds-DNA at high temp Re-anneal to ds-DNA at lower temp. DSN degrades DNA duplex which is presumably from abundant transcripts. Library normalization using DSN • Efficient on E. coli (final rRNA% = 26 ± 11%) • Preserved mRNA relative abundance • Little reduction of the very abundant mRNA Advancing Science with DNA Sequence Still efficient and “faithful” for microbial communities? Relative abundance of OTU (%) Typical species rank abundance 3 2.5 2 1.5 1 0.5 0 1 101 201 301 401 501 601 701 801 901 1001 Rank of OTU Environmental microbial communities are very diverse, with a long tail of minor community members. Advancing Science with DNA Sequence Termite Hindgut Metatranscriptomics - A case study (Preliminary results) Advancing Science with DNA Sequence Termite samples in this study Species: Family: Habitat: Diet: Nasutitermes corniger Termitidae Laboratory colony Dry wood Amitermes wheeleri Termitidae Subtropical desert Cow dung Aim: Determine system-specific differences between termite species with different diets. Advancing Science with DNA Sequence Summary Metatranscriptomics is being advanced by nextgeneration sequencing technologies. Currently, high rRNA content is still a major bottleneck of metatranscriptomics projects. Bioinformatically removing rRNA reads should increase computational speed in de novo assembly, and improve the assembly of low-abundance mRNAs. Need to investigate algorithm that is sensitive and computationally efficient to do this for large datasets. Advancing Science with DNA Sequence Acknowledgement • • • • • • • • • • • • • • • Phil Hugenholtz Susannah Tringe Edward Kirton Kanwar Singh Erika Lindquist Feng Chen Jeff Froula Falk Warnecke Natalia Ivanova Martin Allgaier Zhong Wang Tao Zhang R&D group Production group Many others! • Omri Wurtzel • Rotem Sorek • Hans Peter Klenk • Rudolph Scheffrahn • Jose Escovar-Kousen