Sequencing technologies

advertisement
Current Sequencing Technologies and
Data Generation
Corbin Jones & Piotr Mieczkowski
Department of Biology, College of Arts and Sciences, Carolina
Center for Genome Sciences Department of Genetics, School of
Medicine, University of North Carolina at Chapel Hill
NEXT-GENERATION SEQUENCING (DEEP SEQUENCING) PLATFORMS
o
Short reads
1.
Genome Analyzer IIx (GAIIx), HiSeq2000, HiSeq2500, MiSeq –
Illumina
2.
SOLiD 5500xl System – Applied Biosystem
3.
HeliScope™ Single Molecule Sequencer - Helicos
o
Long reads
1.
Genome Sequencer FLX System (454) – Roche
2.
PacBio RS - Pacific Bioscience
3.
Personal Genome Machine, Ion Proton - Ion Torrent
4.
GridION – Oxford Nanopore
o
1.
2.
Mapping sequences to large DNA fragments
NABsys
Bionanomatrix
UNC – HTSF
•
•
•
•
•
9 HiSeq 2000/2500
1 GA II
PacBio
Ion Torrent
MiSeq (Jeff Dangl)
Liz Buda and Donghui Tan
Also on campus:
454 (Microbiome)
454 jr. (Viral genomics)
MiSeq – Kevin Weeks
What type of sequencing should I choose for the Illumina sequencing
project?
HiSeq 2000/2500 – 100-160mln single end sequencing reads per
lane.
- ChIPseq – Single End 50 cycles (2-3 human samples per lane)
- RNAseq – Single End 50 cycles (2-3 human samples per lane)
If you are interested in splicing variants and fusion genes both Single
End 100cycles and Paired End 2x50cycles will be better option for you.
-Whole Genome Sequencing – Paired End 2x100cycles (2-3 lanes per
genome)
-Exome Capture - Paired End 2x100cycles (4 samples per lane)
MiSeq – 3-7 mln single end sequencing reads per lane. Custom
projects , fast turnaround.
Metagenomics - 16S profile – Paired End 2x150cycles up to 24 samples
per lane.
-Whole Microbial Genome Sequencing - Paired End 2x150cycles
SHORT READ PLATFORMS at UNC
HiSeq 2000
Initially capable of up to 600Gb per run in 13 days.
Cost of resequencing one human genome:
Now UNC PI - (30x coverage) about $6,000
Now for outside of UNC - (30x coverage) about $9,000
HiSeq 2500
Initially capable of up 100Gb per run in
27hours.
Cost per genome - ???
MiSeq
- Small capacity system. PE 2x150cycles in 27hours.
- PE 2 x 250bp coming soon – error rate for read 1 – less than 1%;
read 2 about 1.2%.
- In preparation – PE 2 x 400bp – error rate for read1 about 2%;
read 2 about 4%.
- In preparation – Longer insert size possible 1.5kb
PacBio RS
Single molecule resolution in real time
•
Short waiting time for result and simple
workflow
–
–
•
No amplification required
–
–
•
Distinguish heterogeneous samples
Simultaneous kinetic measurements
Long reads
–
–
•
Bias not introduced
More uniform coverage
Direct observation
–
–
•
Generate basecalls in <1 day
Polymerase speed ≥1 base per second
Identify repeats and structural variants
Less coverage required
Information content
–
One assay, multiple applications
•
•
•
Genetic variation (SVs to SNPs)
Methylation
Enzymology
C2 chemistry – installed March 2012
-Long reads 6-10kb
-Meidan size of molecules 3kb
-Still 15% error rate
-No strobe sequencing
Software focus on:
De novo assembly
Hi quality CCS consensus reads
In preparation
-Load long molecules by magnetic beads
-Modified nucleotides detection
PacBio RS – two sequencing modes
LS – long sequencing reads
Sample Preparation
Standard
• Large insert sizes (2kb-10kb)
• Generates one pass on each molecule sequenced
CCS – high quality sequencing reads
Circular
Consensus
• Small insert sizes 500bp
• Generates multiple passes on each molecule
sequenced
Example Data: 1 smart cell
Pre-Filter # of Bases 180,320,136 bp
Pre-Filter # of Reads 75153
Pre-Filter Mean Readlength 2399 bp
Pre-Filter Mean Read Quality 0.624
% Adapter Dimer (0-10bp) 1.94 %
% Short Insert (11-100bp) 0.47 %
Post-Filter # of Bases 165,424,592 bp
Post-Filter # of Reads 52801
Post-Filter Mean Readlength 3133 bp
Post-Filter Mean Read Quality 0.827
Personal Genome Machine – Ion Torrent
(life technologies)
Three types of semiconductor chips:
314 – 20Mb
316 - 200Mb
318 – 1Gb
Read length depends on base
composition 200-250bp (200cycles)
System is enabled for Paired End
2x100cycles
The fastest sequencing system on the
market.
Recommendation:
Resequencing applications which require
fast turnaround of samples
- Amplicons (PCR products)
- Small and medium size genomes
- Custom DNA capture applications
How it works:
H+ ion is released during base
incorporation. Individual
polymerases attached to
beads are positioned in tiny
wells that rest on a tiny pH
meter.
PGM/Ion Torrent Data 316 chip
Thr.
Total Number of Bases [Mbp] 77.65
‣ Number of Q17 Bases [Mbp] 36.11
‣ Number of Q20 Bases [Mbp] 27.33
Total Number of Reads 368,860
Mean Length [bp] 211
Longest Read [bp] 380
Library Preparation from Low Quantities of DNA or RNA
Microfluidics stationary and portable systems
Mondrian SP System – NuGEN Technologies
- Human libraries from 5ng of
total DNA. Only 10-15% of
duplicate reads.
- Ultralow DNA library systems
Soon:
- Ultralow RNA library systems
- Libraries from total RNA with
rRNA depletion.
Advanced Liquid Logic from RTP
Emerging Sequencing Technologies
Semiconductor sequencing
chip
Nanopore / Nanochannel
sequencing
Ion Proton System
-
Human genome in one day
Cost of reagents $1000 per run
Error rate around 1.2%
Human Genome, RNAseq, ChIPseq
Ion Proton Chip I – 10Gb
(Whole Exome capture
experiments)
Ion Proton Chip II – 100Gb
Whole human Genome
resequencing
Oxford Nanopore – new view on sequencing
Hemolysin – pore - inner diameter of 1nm, about 100,000 times smaller than
that of a human hair.
Oxford Nanopore
DNA sequencing
Error rate 4%, prediction for end of the year 0.1 – 2%.
Nanopore array
Oxford Nanopore – new concepts
MinION
- 150Mb per run
- Tested 48kb read length
-$900 per instrument
-500 pores per device
GridION
- XXXMb per run
- Tested 48kb read length
-$XXX per instrument
-2000 pores per device,
soon 8000 pores
-Cost per human genome
$1500.
Oxford Nanopore – applications
-
DNA sequencing
Protein detection
Protein DNA interaction
Small molecule detection
- 96 well plates for 96
samples
- Controlled time of
sequencing
Intelligent BioSystems Mini20 System
(manufactured by Azco Biotech)
• Amplification by rolony method
• Sequencing by Synthesis with announced 100 base
reads, but expect to compete with Sanger down the road
• Designed for clinical labs
• 20 independent flow cells, no queue for loading, run
asynchronously
• 20M reads/flow cell, 4 GB/ flow cell
• Potential problems with repeats
• System cost $120K, $150 flow cell (disposable), full costs
per sample not clear yet.
• Entering early access now, expect commercial shipping
late 2012
Genia Technologies
• Very early stage announcement – Backed by Life Technologies
(at least 1 year away)
• Describe system as a cross between Ion Torrent and Oxford
Nanopore
• Electronic “Active Control” technology enables highly efficient
nanopore-membrane assembly and control of DNA movement
through the channel
• Initially used α-Hemolysin and claimed 98% raw accuracy with that
but now are using an undisclosed pore for further development.
• Claim sensitivity 1-2 orders of magnitude greater than Oxford
Nanopore.
• Ramping up pore density to 100K pores/chip by end of 2012.
• Plan to market a mobile reader for <$1K and per sample costs <$100
• Plan early access in late 2012, commercial shipment 2013
Basic RNAseq
• Type 1: Description of trancriptome
– Assembly of transcripts/isoforms
– Annotation of genes
• Type 2: “Paired” e.g. treatment vs control
– Differential expression
– Differential transciption
• Type 3: Population
– Elements of 1 and 2, but “random effects”
– TCGA roughly fall into this category
Strand Specific RNAseq
• Perkins et al 2009, Levin et al 2010
• Goal: To mark the RNA molecules in order to
know the direction of transcription.
– differentiate anti-sense transcripts, lncRNAs,
mRNAs etc.
• Many methods, dUTP may be best, Illumina
has kit
End tagged RNAseq
• GOAL: Identify ends of transcripts by
attaching adaptors to ends of mRNAs
– can be used in strand specific protocols
– can be used in annotation and assembly protocols
AAAAAA
mG
Normalized RNAseq
• GOAL: To even the distribution of transcripts
sequenced
– Reduce the representation of high abundance
transcripts and increase sensitivity to low
abundance
Normalized RNAseq 2
• Methods
– Kinetic (Patanjali et al 1991, Bonaldo et al 1996)
– dsDNA nuclease (Zhulidov 2004)
– Cap-Trapper (Carninci et al 2000)
• Results
– Abundant transcripts reduced proportional to freq
– Coverage still proportional to expression
• Problems: bias, contamination w/ ncRNA
Total RNAseq
• Goal: Sequence every RNA molecule in the
cell
– Observe: unspliced RNAs, small RNAs, non-coding
RNAs, tRNAs
– Must remove rRNA!
– Variants: Nuclear only, cytoplasmic only, mRNA
removal
small RNA
• GOAL: Small RNAs are important for gene
regulation, synthesis, splicing, and immunity
(miRNA/miR, snRNA, snoRNAs, scaRNAs)
– Several protocols (e.g. Illumina, Morin et al 2010)
• All involve size selection, which can lead to bias
– Produce short sequences that are then mapped
back to the genome.
• Aside, seem more Poisson like than other counts
RIPseq/CLIPseq/HITS-CLIP
• GOAL: Identify the sites on the RNA where
RNA binding proteins are bound.
– e.g. Components of the spliceosome
– protocol is similar to ChIPseq except there is a
random hexamer ds-cDNA synthesis step
– refs: Khalil et al 2009, Sanford 2009, Licatalosi
2008, Zhang and Darnell 2011
Download