Transcriptomics Jim Noonan GENE 760 Transcriptomics Introduction to RNA-seq RNA-seq workflow Martin and Wang Nat Rev Genet 12:671 (2011) Wang et al. Nat Rev Genet 10:57 (2009) Illumina RNA-seq library preparation Capture poly-A RNA with poly-T oligo attached beads (100 ng total) (2x) • • RNA quality must be high – degradation produces 3’ bias Non-poly-A RNAs are not recovered Fragment mRNA Synthesize ds cDNA Ligate adapters Amplify Generate clusters and sequence Ribosomal RNA subtraction RiboMinus Quantifying relative expression levels in RNA-seq Use existing gene annotation: • • • • • • Align to genome plus annotated splices Depends on high-quality gene annotation Which annotation to use: RefSeq, GENCODE, UCSC? Isoform quantification? Identifying novel transcripts? Differential expression De novo transcript assembly: • Assemble transcripts directly from reads • Allows transcriptome analyses of species without reference genomes Mapping RNA-seq reads Quantifying relative expression levels in RNA-seq Reads per kilobase of feature length per million mapped reads (RPKM) Fragments per kilobase per million mapped reads (FPKM) (paired-end reads) Transcripts per million (TPM) Counts per million (CPM) • What is a “feature?” • What about genomes with poor genome annotation? • What about species with no sequenced genome? For a detailed comparison of normalization methods, see: Bullard et al. BMC Bioinformatics 11:94 (2010). Robinson and Oshlack, Genome Biol 11:R25 (2010) Composite gene models Map reads to genome Map remaining reads to known splice junctions • Requires good gene models • Isoforms are ignored Which gene annotation to use? Splice-aware short read aligners Martin and Wang Nat Rev Genet 12:671 (2011) The ‘Tuxedo’ suite Trapnell et al. Nature Protocols 7:562 (2012) Cufflinks: ab initio transcript assembly Step 1: map reads to reference genome Trapnell et al. Nat. Biotechnology 28:511 (2010) Cufflinks: ab initio transcript assembly Isoform abundances estimated by maximum likelihood Trapnell et al. Nat. Biotechnology 28:511 (2010) Differential expression Garber et al. Nat Methods 8:469 (2011) Differential expression Popular methods: • EdgeR • DEseq • Cuffdiff Require count data Assume negative binomial or Poisson distribution Garber et al. Nat Methods 8:469 (2011) What depth of sequencing is required to characterize a transcriptome? Wang et al. Nat Rev Genet 10:57 (2009) Considerations Gene length: • Long genes are detected before short genes Expression level: • High expressors are detected before low expressors Complexity of the transcriptome: • Tissues with many cell types require more sequencing Feature type • Composite gene models • Common isoforms • Rare isoforms Detection vs. quantification • Obtaining confident expression level estimates (e.g., “stable” RPKMs) requires greater coverage Applications of RNA-seq Characterizing transcriptome complexity • Alternative splicing Differential expression analysis • Gene- and isoform-level expression comparisons Novel RNA species • lincRNAs • Pervasive transcription Allele-specific expression • Effect of genetic variation on gene expression • Imprinting RNA editing • Novel events Alternative isoform regulation in human tissue transcriptomes Wang et al Nature 456:470 (2008) Diversity of alternative splicing events in human tissues Wang et al. Nature 456:470 (2008) Novel RNA species: annotating lincRNAs Guttman et al Nat Biotechnol 28:503 (2010) Small RNA sequencing Rother and Meister, Biochimie 93: 1905 (2011) Small RNA sequencing microRNAs ~22 nt piRNAs ~25-30 nt Rother and Meister, Biochimie 93: 1905 (2011) Small RNA sequencing: Illumina protocol microRNAs ~22 nt piRNAs ~25-30 nt Distinguishing functional small RNAs from noise • Structural similarity to known small RNAs: miR-deep, miR-cat • Binding to small RNA processing proteins • Genetic requirements for processing Friedlander et al. Nat Biotechnology 26:407 (2008) Measuring translation by ribosome footprinting Ingolia, Nat Rev Genet 15:205(2014) Measuring translation by ribosome footprinting Ingolia et al. Science 324:218 (2009) Measuring translation by ribosome footprinting Ingolia et al. Science 324:218 (2009) Some lincRNAs are translated in mouse ES cells Ingolia et al. Cell 147:789 (2011) Detecting RNA-protein interactions: CLIP Rother and Meister, Biochimie 93: 1905 (2011) Enhancer-associated RNAs (eRNAs) Ren B. Nature 465:173 (2010) Enhancer-associated RNAs (eRNAs) Kim et al Nature 465:182 (2010) How much of the genome is transcribed? Estimates from ENCODE Kellis et al. Proc. Natl. Acad. Sci. USA 111:6131 (2014)