HTS technologies

advertisement
High-Throughput Sequencing
Technologies
Biological Sequence Analysis
BNFO 691/602 Spring 2014
Mark Reimers
Outline
• What can we do with next-generation
sequencing?
– De novo sequencing of simple genomes
– Re-sequence individual variations
– Generate genome-wide quantitative data for a
variety of assays
• What technologies are now available and
which are up-and-coming?
– Roche, Illumina, SOLiD, Ion Torrent, etc…
What is High-Throughput
Sequencing?
• Generating many thousands or millions of
short (30 to 1,000 base) sequences by
sequencing parts of longer (200+ base)
DNA fragments
• Most research uses reads from one end of
a fragment (single-end), but most
technologies can be adapted to make
paired-end reads on opposite strands
Full Genome Re-sequencing has been done
for many cancers and rare clinical disorders
Exome sequencing is a cost-effective to
identify de novo protein coding mutations
Targeted re-sequencing of a few relevant
genes can identify diverse critical mutations
across a large number of cases
RNA-seq
ChIP-seq
DNA methylation profiling
mC  C
CU
After PCR
CC
UT
PCR+Seq
DNAse Hyper-sensitivity
• DNAse I enzyme cuts DNA
• Much more likely to cut at open chromatin
• Two approaches:
– Cut slowly then fragment and sequence ends
– Cut rapidly then sequence short fragments
Mapping of chromatin interactions (5C)
(courtesy Elemento lab)
HTS Technologies
•
•
•
•
•
•
Roche-454 (will close 2016)
Illumina
SOLiD
Ion Torrent
Newer Technologies
Outlook
Founded by Jonathan
Rothberg as a secret
project (code-named ‘454’)
within CuraGen
Roche 454 Sequencing
Metzker,
NG 2010
Roche 454 Sequencing
GS FLX Data Analysis
Generation
RocheFlowgram
454 Peak
Heights Data
4‐
mer
T
A
C
G
Flow Order
Flowgram
3‐mer
TTCTGCGAA
2‐mer
1‐mer
Key sequence = TCAG for signal calibration
CSB2008 August 2008
UCSC Sequencing Center
Advantages & Drawbacks
• PRO
Long reads are uniquely identifiable
Relatively quick ~20 hours total
• CON
Cost is relatively high
Frequent errors in runs of bases
Frequent G-A transitions
Best Uses of Roche 454
• De novo small genome (prokaryote or
small eukaryote genome) sequencing
• Metagenomics by16S profiling
• Used to be best for metagenomics by
random sequencing
– new long reads from Illumina are competitive
• Targeted re-sequencing of small samples
Illumina (Solexa) Genome
Analyzer and Flow Cell
Illumina On-Chip Amplification
Illumina (Solexa) Sequencing
Paired-End Illumina Method
Paired-end reads are easy on
Illumina because the clusters
are generated by ligated
linkers.
Different linkers and primers
are attached to each end
Advantages & Drawbacks
• PRO
– Very high throughput
– Most widespread technology so that
comparisons seem easier
• CON
– Sequencing representation biases, especially
at beginning
– Slow – up to a week for a run
Best Uses of Illumina
• Expression analysis (RNA-Seq)
• Chromatin Immunoprecipitation (ChIPSeq)
• Metagenomics by random sequencing
SOLiD
Sequencing by Oligonucleotide
Ligation and Detection
SOLiD History
• George Church licensed his ‘polony’
technique to Agencourt Personal
Genomics
• ABI acquired the SOLiD technology from
Agencourt in 2006
SOLiD Preparation Steps
• Prepare either
single or ‘mate-pair’
library from DNA
fragments
• Attach library
molecules to
beads; amplify
library by emulsion
PCR
• Modify 3’ ends of
clones; attach
beads to surface
Emulsion PCR
• Emulsion PCR isolates individual DNA molecules
along with primer-coated beads in aqueous
droplets within an oil phase. A polymerase chain
reaction (PCR) then coats each bead with clonal
copies of the DNA molecule. The bead is
immobilized for sequencing.
ABI SOLiD Sequencing Cycle
SOLiD Reads Each Base Twice
Most bases are matched by two primers in different ligation cycles
SOLiD Color Coding Scheme
If you translate color reads directly into base reads then every sequence
with an error in the color calls will result in a frame-shift of the base calls. it
is best to convert the reference sequence into color-space. There is one
unambiguous conversion of a base reference sequence into color-space,
but there are four possible conversions of a color string into base strings
Advantages & Drawbacks
• PRO
– Very high throughput
– Di-base ligation ensures built-in accuracy check
• Low error rate for low-coverage
– Can handle repetitive regions easily
• CON
– Strong cycle-dependent biases (can be modeled and
partly overcome – see Wu et al, Nature Methods,
2011)
– Low quality color calls (Phred < 20) are common
– Reported problems with paired ends – most mapped
tags don’t map to the same chromosome
Ion Torrent Sample Prep
• Emulsion PCR loads copies of unique
sequences onto beads
• One bead is deposited in each well of a
micro-machined plate
An Ion Torrent Chip
From Ion Torrent promotional material
When a nucleotide is incorporated into a strand of
DNA by a polymerase, a hydrogen ion is released
From Ion Torrent promotional material
Ion Torrent Sequencing Process
From Ion Torrent promotional material
As in 454, nucleotides are washed over the nascent strand in a
prescribed sequence. Each time a nucleotide is incorporated,
hydrogen ions are released and detected.
Newest Machine – Ion Proton
• $150K per machine
• Ion Proton I chip has 165 million sensors
– Intended for exomes
• Ion Proton II chip has 660 million sensors
– 50X more than 318 chip
– Claim $1K genome this year
Ion Torrent Signals
• Like 454, a series of pH signals over time
as different nucleotides are added
From promotional literature
Ion Torrent Signals
• Like 454, the reads don’t always make
integer multiples, and some guessing is
needed
Ion Torrent Advantages & Drawbacks
• PRO
Homopolymer error rates
– Very high throughput potential
– Very fast (an afternoon)
• CON
– Homopolymer run errors are still a
problem, but less so recently
– Very uneven loading of sequences wastes
a lot of real estate on the chips
– No prospect of paired-end reads
Loading Density
Newer Technologies
• Complete Genomics
• Pacific Biosciences
• Oxford Nanopore
Complete Genomics
• Service company only – no equipment
sales
• ~$4,000 per human genome (2011 price)
• DNA Nanoball technology generates
paired-end sequences plated at high
density
• Sequenced by ligation
Pacific Biosciences
• Single-molecule real-time (SMRT)
sequencing by circular strand technology
using semiconductor technology
• Long reads
promised at
under $200 per
genome
• High random
error rates
reported early
• Seems better
Signals from Pac Bio Can Detect mC
From Agarwal et al, Nature Methods
Oxford Nanopore
• Single-molecule sequencing by threading
DNA through a protein nanopore
• GridION is a general technology for
sequencing polymers by measuring
current – can do polypeptides also
Download