Sequencing technologies

advertisement
Current Sequencing Technologies and
Data Generation
Corbin Jones & Piotr Mieczkowski
Department of Biology, College of Arts and Sciences, Carolina
Center for Genome Sciences Department of Genetics, School of
Medicine, University of North Carolina at Chapel Hill
NEXT-GENERATION SEQUENCING (DEEP SEQUENCING) PLATFORMS
o
Short reads
1.
Genome Analyzer IIx (GAIIx), HiSeq2000, HiSeq2500, MiSeq –
Illumina
2.
SOLiD 5500xl System – Applied Biosystem
3.
HeliScope™ Single Molecule Sequencer - Helicos
o
Long reads
1.
Genome Sequencer FLX System (454) – Roche
2.
PacBio RS - Pacific Bioscience
3.
Personal Genome Machine, Ion Proton - Ion Torrent
4.
GridION – Oxford Nanopore
o
1.
2.
Mapping sequences to large DNA fragments
NABsys
Bionanomatrix
UNC – HTSF
•
•
•
•
•
9 HiSeq 2000/2500
1 GA II
PacBio
Ion Torrent
MiSeq (Jeff Dangl)
Liz Buda and Donghui Tan
Also on campus:
454 (Microbiome)
454 jr. (Viral genomics)
MiSeq – Kevin Weeks
What type of sequencing should I choose for the Illumina sequencing
project?
HiSeq 2000/2500 – 100-160mln single end sequencing reads per
lane.
- ChIPseq – Single End 50 cycles (2-3 human samples per lane)
- RNAseq – Single End 50 cycles (2-3 human samples per lane)
If you are interested in splicing variants and fusion genes both Single
End 100cycles and Paired End 2x50cycles will be better option for you.
-Whole Genome Sequencing – Paired End 2x100cycles (2-3 lanes per
genome)
-Exome Capture - Paired End 2x100cycles (4 samples per lane)
MiSeq – 3-7 mln single end sequencing reads per lane. Custom
projects , fast turnaround.
Metagenomics - 16S profile – Paired End 2x150cycles up to 24 samples
per lane.
-Whole Microbial Genome Sequencing - Paired End 2x150cycles
SHORT READ PLATFORMS at UNC
HiSeq 2000
Initially capable of up to 600Gb per run in 13 days.
Cost of resequencing one human genome:
Now UNC PI - (30x coverage) about $6,000
Now for outside of UNC - (30x coverage) about $9,000
HiSeq 2500
Initially capable of up 100Gb per run in
27hours.
Cost per genome - ???
MiSeq
- Small capacity system. PE 2x150cycles in 27hours.
- PE 2 x 250bp coming soon – error rate for read 1 – less than 1%;
read 2 about 1.2%.
- In preparation – PE 2 x 400bp – error rate for read1 about 2%;
read 2 about 4%.
- In preparation – Longer insert size possible 1.5kb
PacBio RS
Single molecule resolution in real time
•
Short waiting time for result and simple
workflow
–
–
•
No amplification required
–
–
•
Distinguish heterogeneous samples
Simultaneous kinetic measurements
Long reads
–
–
•
Bias not introduced
More uniform coverage
Direct observation
–
–
•
Generate basecalls in <1 day
Polymerase speed ≥1 base per second
Identify repeats and structural variants
Less coverage required
Information content
–
One assay, multiple applications
•
•
•
Genetic variation (SVs to SNPs)
Methylation
Enzymology
C2 chemistry – installed March 2012
-Long reads 6-10kb
-Meidan size of molecules 3kb
-Still 15% error rate
-No strobe sequencing
Software focus on:
De novo assembly
Hi quality CCS consensus reads
In preparation
-Load long molecules by magnetic beads
-Modified nucleotides detection
PacBio RS – two sequencing modes
LS – long sequencing reads
Sample Preparation
Standard
• Large insert sizes (2kb-10kb)
• Generates one pass on each molecule sequenced
CCS – high quality sequencing reads
Circular
Consensus
• Small insert sizes 500bp
• Generates multiple passes on each molecule
sequenced
Example Data: 1 smart cell
Pre-Filter # of Bases 180,320,136 bp
Pre-Filter # of Reads 75153
Pre-Filter Mean Readlength 2399 bp
Pre-Filter Mean Read Quality 0.624
% Adapter Dimer (0-10bp) 1.94 %
% Short Insert (11-100bp) 0.47 %
Post-Filter # of Bases 165,424,592 bp
Post-Filter # of Reads 52801
Post-Filter Mean Readlength 3133 bp
Post-Filter Mean Read Quality 0.827
Personal Genome Machine – Ion Torrent
(life technologies)
Three types of semiconductor chips:
314 – 20Mb
316 - 200Mb
318 – 1Gb
Read length depends on base
composition 200-250bp (200cycles)
System is enabled for Paired End
2x100cycles
The fastest sequencing system on the
market.
Recommendation:
Resequencing applications which require
fast turnaround of samples
- Amplicons (PCR products)
- Small and medium size genomes
- Custom DNA capture applications
How it works:
H+ ion is released during base
incorporation. Individual
polymerases attached to
beads are positioned in tiny
wells that rest on a tiny pH
meter.
PGM/Ion Torrent Data 316 chip
Thr.
Total Number of Bases [Mbp] 77.65
‣ Number of Q17 Bases [Mbp] 36.11
‣ Number of Q20 Bases [Mbp] 27.33
Total Number of Reads 368,860
Mean Length [bp] 211
Longest Read [bp] 380
Library Preparation from Low Quantities of DNA or RNA
Microfluidics stationary and portable systems
Mondrian SP System – NuGEN Technologies
- Human libraries from 5ng of
total DNA. Only 10-15% of
duplicate reads.
- Ultralow DNA library systems
Soon:
- Ultralow RNA library systems
- Libraries from total RNA with
rRNA depletion.
Advanced Liquid Logic from RTP
Emerging Sequencing Technologies
Semiconductor sequencing
chip
Nanopore / Nanochannel
sequencing
Ion Proton System
-
Human genome in one day
Cost of reagents $1000 per run
Error rate around 1.2%
Human Genome, RNAseq, ChIPseq
Ion Proton Chip I – 10Gb
(Whole Exome capture
experiments)
Ion Proton Chip II – 100Gb
Whole human Genome
resequencing
Oxford Nanopore – new view on sequencing
Hemolysin – pore - inner diameter of 1nm, about 100,000 times smaller than
that of a human hair.
Oxford Nanopore
DNA sequencing
Error rate 4%, prediction for end of the year 0.1 – 2%.
Nanopore array
Oxford Nanopore – new concepts
MinION
- 150Mb per run
- Tested 48kb read length
-$900 per instrument
-500 pores per device
GridION
- XXXMb per run
- Tested 48kb read length
-$XXX per instrument
-2000 pores per device,
soon 8000 pores
-Cost per human genome
$1500.
Oxford Nanopore – applications
-
DNA sequencing
Protein detection
Protein DNA interaction
Small molecule detection
- 96 well plates for 96
samples
- Controlled time of
sequencing
Intelligent BioSystems Mini20 System
(manufactured by Azco Biotech)
• Amplification by rolony method
• Sequencing by Synthesis with announced 100 base
reads, but expect to compete with Sanger down the road
• Designed for clinical labs
• 20 independent flow cells, no queue for loading, run
asynchronously
• 20M reads/flow cell, 4 GB/ flow cell
• Potential problems with repeats
• System cost $120K, $150 flow cell (disposable), full costs
per sample not clear yet.
• Entering early access now, expect commercial shipping
late 2012
Genia Technologies
• Very early stage announcement – Backed by Life Technologies
(at least 1 year away)
• Describe system as a cross between Ion Torrent and Oxford
Nanopore
• Electronic “Active Control” technology enables highly efficient
nanopore-membrane assembly and control of DNA movement
through the channel
• Initially used α-Hemolysin and claimed 98% raw accuracy with that
but now are using an undisclosed pore for further development.
• Claim sensitivity 1-2 orders of magnitude greater than Oxford
Nanopore.
• Ramping up pore density to 100K pores/chip by end of 2012.
• Plan to market a mobile reader for <$1K and per sample costs <$100
• Plan early access in late 2012, commercial shipment 2013
Basic RNAseq
• Type 1: Description of trancriptome
– Assembly of transcripts/isoforms
– Annotation of genes
• Type 2: “Paired” e.g. treatment vs control
– Differential expression
– Differential transciption
• Type 3: Population
– Elements of 1 and 2, but “random effects”
– TCGA roughly fall into this category
Strand Specific RNAseq
• Perkins et al 2009, Levin et al 2010
• Goal: To mark the RNA molecules in order to
know the direction of transcription.
– differentiate anti-sense transcripts, lncRNAs,
mRNAs etc.
• Many methods, dUTP may be best, Illumina
has kit
End tagged RNAseq
• GOAL: Identify ends of transcripts by
attaching adaptors to ends of mRNAs
– can be used in strand specific protocols
– can be used in annotation and assembly protocols
AAAAAA
mG
Normalized RNAseq
• GOAL: To even the distribution of transcripts
sequenced
– Reduce the representation of high abundance
transcripts and increase sensitivity to low
abundance
Normalized RNAseq 2
• Methods
– Kinetic (Patanjali et al 1991, Bonaldo et al 1996)
– dsDNA nuclease (Zhulidov 2004)
– Cap-Trapper (Carninci et al 2000)
• Results
– Abundant transcripts reduced proportional to freq
– Coverage still proportional to expression
• Problems: bias, contamination w/ ncRNA
Total RNAseq
• Goal: Sequence every RNA molecule in the
cell
– Observe: unspliced RNAs, small RNAs, non-coding
RNAs, tRNAs
– Must remove rRNA!
– Variants: Nuclear only, cytoplasmic only, mRNA
removal
small RNA
• GOAL: Small RNAs are important for gene
regulation, synthesis, splicing, and immunity
(miRNA/miR, snRNA, snoRNAs, scaRNAs)
– Several protocols (e.g. Illumina, Morin et al 2010)
• All involve size selection, which can lead to bias
– Produce short sequences that are then mapped
back to the genome.
• Aside, seem more Poisson like than other counts
RIPseq/CLIPseq/HITS-CLIP
• GOAL: Identify the sites on the RNA where
RNA binding proteins are bound.
– e.g. Components of the spliceosome
– protocol is similar to ChIPseq except there is a
random hexamer ds-cDNA synthesis step
– refs: Khalil et al 2009, Sanford 2009, Licatalosi
2008, Zhang and Darnell 2011
Download
Related flashcards
Peptide hormones

65 Cards

Peptides

79 Cards

Molecular biology

92 Cards

Apple cultivars

86 Cards

Dipeptides

12 Cards

Create flashcards