Nat Rev Genet

advertisement
mRNA-Seq: methods and applications
Jim Noonan
GENE 760
Introduction to mRNA-seq
• Technical methodology
• Read mapping and normalization
• Estimating isoform-level gene expression
• De novo transcript reconstruction
• Sensitivity and sequencing depth
• Differential expression analysis
mRNA-seq
workflow
Martin and Wang Nat Rev Genet 12:671 (2011)
Wang et al. Nat Rev Genet 10:57 (2009)
Illumina RNA-seq library preparation
Capture poly-A RNA with poly-T oligo attached beads (100 ng total) (2x)
•
•
RNA quality must be high – degradation produces 3’ bias
Non-poly-A RNAs are not recovered
Fragment mRNA
Synthesize ds cDNA
Ligate adapters
Amplify
Generate clusters and
sequence
Ribosomal RNA subtraction
RiboMinus
Mapping RNA-seq reads and quantifying transcripts
RNA-seq reads mapped to a reference genome
Normalization :
Reads per kilobase of feature length per million mapped reads (RPKM)
•
•
•
What is a “feature?”
What about genomes with poor genome annotation?
What about species with no sequenced genome?
For a detailed comparison of normalization methods, see:
Bullard et al. BMC Bioinformatics 11:94 (2010).
Robinson and Oshlack, Genome Biol 11:R25 (2010)
Quantifying gene expression by RNA-seq
Use existing gene annotation:
•
•
•
•
•
Align to genome plus annotated splices
Depends on high-quality gene annotation
Which annotation to use: RefSeq, GENCODE, UCSC?
Isoform quantification?
Identifying novel transcripts?
Reference-guided alignments:
• Align to genome sequence
• Infer splice events from reads
• Allows transcriptome analyses of genomes with poor
gene annotation
De novo transcript assembly:
• Assemble transcripts directly from reads
• Allows transcriptome analyses of species without
reference genomes
Composite gene model approach
Map reads to genome
Map remaining reads to
known splice junctions
• Requires good gene models
• Isoforms are ignored
Which gene annotation to use?
Strategies for transcript assembly
Garber et al. Nat Methods 8:469 (2011)
Splice-aware short read aligners
Martin and Wang Nat Rev Genet 12:671 (2011)
Reference based transcript assembly
Martin and Wang Nat Rev Genet 12:671 (2011)
Transcript assembly programs
Martin and Wang Nat Rev Genet 12:671 (2011)
Cufflinks: ab initio transcript assembly
Step 1: map reads to reference genome
Trapnell et al. Nat. Biotechnology 28:511 (2010)
Cufflinks: ab initio transcript assembly
Isoform abundances estimated
by maximum likelihood
Trapnell et al. Nat. Biotechnology 28:511 (2010)
Graph-based transcript assembly
Martin and Wang Nat Rev Genet 12:671 (2011)
Graph-based transcript assembly
Martin and Wang Nat Rev Genet 12:671 (2011)
Trinity: de novo transcript assembly
Grabherr et al. Nat Biotechnol 29:644 (2011)
What depth of sequencing is required to characterize a transcriptome?
Wang et al. Nat Rev Genet 10:57 (2009)
Considerations
Gene length:
• Long genes are detected before short genes
Expression level:
• High expressors are detected before low expressors
Complexity of the transcriptome:
• Tissues with many cell types require more sequencing
Feature type
• Composite gene models
• Common isoforms
• Rare isoforms
Detection vs. quantification
• Obtaining confident expression level estimates (e.g.,
“stable” RPKMs) requires greater coverage
Transcript detection is biased in favor of long genes
Tarazona et al. Genome Res 21:2213 (2011)
Applications of mRNA-seq
Characterizing transcriptome complexity
• Alternative splicing
Differential expression analysis
• Gene- and isoform-level expression comparisons
Novel RNA species
• lincRNAs and eRNAs
• Pervasive transcription
Translation
• Ribosome profiling
Allele-specific expression
• Effect of genetic variation on gene expression
• Imprinting
RNA editing
• Novel events
Alternative isoform regulation in human tissue transcriptomes
Wang et al Nature 456:470 (2008)
Diversity of alternative splicing events in human tissues
Wang et al. Nature 456:470 (2008)
Differential expression
Garber et al. Nat Methods 8:469 (2011)
Programs for identifying DE genes in RNA-seq datasets
Program
Assumed distribution for URL
count data
DESeq
Negative binomial
wwwhuber.embl.de/users/anders/DESeq/
DEGseq
Poisson
www.bioconductor.org/packages/2.6/
bioc/html/DEGseq.html
edgeR
Negative binomial
www.bioconductor.org/packages/rele
ase/bioc/html/edgeR.html
baySeq
Negative binomial
www.bioconductor.org/packages/rele
ase/bioc/html/baySeq.html
Cuffdiff
Negative binomial
cufflinks.cbcb.umd.edu/
Differential expression:
Characterizing transcriptome dynamics during brain development
Neuronal functions
synaptic transmission
cell adhesion
Embryonic mouse cortex
Neuronal
migration
RNA-seq
DEX
“Stemness” functions
Cell cycle
M phase
Sox2, Oct4
Ayoub et al PNAS 1086:14950 (2011)
Differential expression:
Characterizing transcriptome dynamics during brain development
Differential isoforms
Embryonic mouse cortex
RNA-seq
DE isoforms
Ayoub et al PNAS 1086:14950 (2011)
Novel RNA species: annotating lincRNAs
Guttman et al Nat Biotechnol 28:503 (2010)
Neurons treated
with KCL
Kim et al Nature 465:182 (2010)
Enhancer-associated RNAs (eRNAs)
Enhancer-associated RNAs (eRNAs)
Ren B. Nature 465:173 (2010)
How much of the genome is transcribed?
van Bakel et al. PLoS Biol. 8:e1000371 (2010)
Exploiting sequence information in RNA-seq reads
Majewski and Pastinen. Trends Genet 27:72 (2011)
Detecting variants that affect splicing
Pickrell et al . Nature 464:768 (2010)
Summary:
mRNA-seq applications
• Quantify transcriptome complexity and compare
across biological states
• Determine how transcriptomes are translated in
different biological contexts
• Effect of genetic variation on gene expression
• Imprinting and RNA editing
Download