Next-generation sequencing and PBRC

advertisement
Next-generation sequencing and PBRC
Obesity
Cell Biology
Neuroscience
Physiology
Nutrition
Development
Ini al Consulta on
Experiment Design
Cost - Timeline
Immunology
PI: Biology
Experimental
Paradigm
Sample Collection
Sample Preparation
Genomics
Core
Quality Control
Sample Analysis
Diabetes
AB SOLiD vs.4
Instrument Runs
Raw Data to Pipeline
Data Analysis
Pipelines
Gene Expression: SAGE, RNA-Seq
Read Mapping and Annota on · Gene and
Summary Sta s cs · Cluster Analyses ·
Visualiza on · Molecular Func ons · Biological
Pathways · Links to NCBI · Literature
Small RNA
Mapping · Known miRNA · Novel miRNA ·
Expression · Sequence & Loca on · Links to
miRBASE · Known gene interac ons
Epigenome: Methyla on, ChIP Seq
Mapping · Peak calling · Sta s cs · Visualiza on ·
Compara ve Epigenomics
Resequencing
Mapping · SNP calling · Small indel · Large indel ·
CNV · Inversions
Next Generation Sequencer Applications
•
•
•
•
•
•
•
•
•
•
•
•
DeNovo Sequencing
Resequencing, Comparative Genomics
Global SNP Analysis
Gene Expression Analysis
Methylation Studies
ChIP Sequencing-transcription factors, histones, polymerases
Transcriptome Analysis-splicing, UTRs, cSNPs, nested transcripts
MicroRNA Discovery and quantitation
Metagenomics, Microbial diversity
Copy number variation
Chromosomal aberrations
Gene regulation studies
AB SOLiD Ligation sequencing
Observed color sequence
base
color_1
color_2
color_3
color_4
color_5
0=A
y
1
r
y
2
3
4
5
r
b
g
b
b
g
r
b
6
r
r
7
8
y
r
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
b
b
g
g
r
r
y
y
b
b
y
y
g
g
y
y
g
g
r
r
r
r
y
y
b
b
r
r
y
y
b
b
y
y
y
r
r
b
b
b
b
y
y
g
g
r
r
g
g
b
b
y
y
T
T
C
C
r
r
Color dinucleotide space
R
A
C
G
T
T
G
C
A
G
A
C
G
T
C
A
T
G
B
A
C
G
T
A
C
G
T
Y
G
T
A
C
A
C
G
T
T
T
A
A
Color to DNA sequence
A
A
T
C
G
G
G
C
C
T
A
C
C
A
A
A
A
G
G
A
A
A
A
C
C
T
T
C
C
A
A
C
C
G
G
C
C
C
C
T
T
A
A
C
C
C
C
G
G
T
T
C
C
C
C
C
C
T
T
A
A
A
A
G
G
A
A
A
How many sequence tags* do I need for my
gene expression application?
•
•
•
•
•
•
SAGE/CAGE – 2-5 million mappable
miRNA – 10 million mappable
ChIP Seq—10-20 million mappable
Whole Transcriptome from polyA RNA – 40-50 million mappable
Whole Transcriptome from rRNA depleted - >50 million mappable
Whole Transcriptome for Allele Specific Expression - >>50 million mappable
SOLiD™ 4 generates >1.4 billion mappable sequences/run (2 slides)
Libraries can be multiplexed to decrease the cost/sample according to
the application and number of sequences needed.
*For human/mouse sized genomes; smaller organisms require fewer sequence tags.
SAGE Sequencing vs. Microarray
SOLiD v4
Microarray-Illumina
Ref 8
Microarray-Illumina
Ref 6
3.6 million
25,600
45,200
Known and novel
transcripts
Known transcripts
Known transcripts
Sensitivity
6 logs
3 logs
3 logs
Technical Reproducibility
>.99-.999
0.9
0.9
Correlation to Taqman
0.9
0.7-0.8
0.7-0.8
Multiplexing/Barcoding
Yes –up to 48 RNA or 96
DNA samples
No
No
Data Points
No background –better for
Hybridization process
low abundance transcript creates background signal
detection
Hybridization process
creates background
signal
RNA quantity
5-10 ug
750 ng
750 ng
16 Sample Experiment
Cost
$7200-full service
$6100-PI creates library
$3600
$5200
Bioinformatics: Geospiza
Run Quality
Primary Data Analysis - Images to bases
Secondary Data Analysis – Bases to alignments/contigs
Applications
• Tag Profiling
• Small RNA Analysis
• Transcriptome seq.
• ChIP-Seq
• Methylation Analysis
• Resequencing
• De novo assembly
One or more
Data sets
Discovery
Ref Seq + Alignment
Assembly, De Novo
Sample/Library Quality
Instrument-specific
Sequences +
Quality values
Algorithms
• Eland
• Maq
• SOAP
• Velvet
• Newbler
• Mapreads
• Others …
Tertiary Data Analysis – Experiment Specific
• Differential expression
• Methylation sites
• Binding sites
• Gene association
• Genomic structure
Next-gen sequencing: applications
– Genome analysis: basic and translational research
• Genetics of disease – new frontiers
• Exome resequencing: confirmation of GWAS
• Genome sequence as diagnostic tool
• Genetic counseling
– Epigenome analysis: basic research; biomarkers
• Analyses of DNA methylation, transcription factors, histone
modifications, non-coding RNA
• Epigenomic biomarkers of disease
– Gene expression analysis: basic research; diagnostics & biomarkers
• Whole transcriptome: all transcribed sequences in a cell
• SAGE analysis: expression of known genes
• Small RNA: microRNA as regulators of biology
– Genotype to phenotype: a new frontier
• Pathology: systems biology
• Diagnosis: data filtering
• Personalized Genomic Medicine: Treatment recommendations
Next-gen sequencing: challenges
– Rapid growth in methodology
• Technology and equipment changes & upgrades
– High demands on informatics:
• Staff
• Software
• Computational resources
– New ways of handling data needed:
• Interpretation
• Publication
• Storage
Download