Illumina (Solexa) sequencing

advertisement
Sequencing
tutorial
Peter HANTZ
EMBL Heidelberg
Dideoxy (Sanger) sequencing
Principle:
Gel electrophoresis: discrimination of 1 bp below ~1000 bp
Synthesis: starts with a DNA oligo, stops after incorporating a (marked) ddNTP
First ~ 60 bp uncertain (high relative mass of the fluo. dye)
Radiolabeling:
4 reactions
Dye-termination:
4 fluorescent dyes,
one reaction
Uni Osnabruck
M. Waterman
Pyrosequencing (Roche / 454)
ds
Bead I.
Streptavidin coated
Library construction
A,B: short DNA oligos fused with
genomic DNA segments
B is biotinilated
Selection of dsDNA: streptavidin-coated magnetic beads
denaturation: AB strands collected
www.454.com
wiki
Pyrosequencing (Roche / 454)
Bead II.
Simple agarose beads coated with B oligos
Single sstDNA (singles-stranded template DNA)
with cA and cB oligo immobilized one on a bead
Bead-bound library emulsified (water-in-oil)
PCR reaction:
One strand will be covalently bound to the bead
www.454.com
wiki
Pyrosequencing (Roche / 454)
denaturation, one strand is released
Following the selection of DNA-positive beads (enrichment),
Beads+reactants in wells having a diameter of cca 40um
www.454.com
wiki
Pyrosequencing (Roche / 454)
The reaction:
-addition of dNTP-s:
incorporation releases pyrophosphate
(only one phosphate is needed for the backbone)
-ATP sulfurylase converts PPi to ATP
-luciferase: acts in the presence of ATP
-Unincorporated nucleotides
and ATP are degraded by the apyrase
-400,000 reads in parallel
-multiple consensus incorporations:
>higher signal intensity
>problematic...
www.454.com
wiki
Illumina (Solexa) sequencing
-making DNA library (~300bp fragments)
-ligation of adapters A and B to the fragments
-binding the ssDNA randomly to the flow cell surface
-complementary primers are ligated to the surface
Illumina-Fasteris
Illumina (Solexa) sequencing
Bridge amplification:
initiation
On the surface: complementary oligos
GeneCore
Illumina (Solexa) sequencing
EMBL Gene Core
Illumina (Solexa) sequencing
Data aquisition:
sequencing by synthesis:
“reverible terminator” nucleotides
blocked + fluorescently labeled
de-blocking to enable the synthesis
dye cleavage+elimination
wash step+repeat
TGCA
illumina.com
Illumina (Solexa)
sequencing
Mate-pair sequencing
Single Molecule Real Time Sequencing
Principle:
fluorescent label on the terminal phosphate of NTP-s
DNA polymerase:
cleaves this
incorporation lasts ~ mS
Detection:
"Zero-Mode Waveguide" holes:
near-field standing waves
(~Total Internal Reflection )
Present performance:
1,500 bp in read lengths
Wiki
Pacific Biosciences
Assembling
Shotgun sequencing
The genome is fragmented randomly (sonication)
No positional and orientatin information is available
The fragments are sequenced
The results have to be assembled
Merging reads into contigs
www.bioalgorithms.info
Graphs
set of edges that connect pairs of nodes
used to model pairwise relations between certain objects
Bridges of Königsberg
Leonhard Euler, 1735
Find a path that visits each bridge (=edge) once!
Eulerian path problem:
visit each edge once and only once:
linear-time algorithm
www.ams.org
Hamiltonian Path Problem
Find a route that visits
each node (=each airport)
exactly once
This is an NP (Non-Polynomial) -problem
the amount of computation necessary,
using the most efficient algorithms known at present,
grows exponentially with the size of the route map
www.wolfram.com
Traveling Salesman Problem
Find the shortest path which visits every vertex exactly once.
That is: the shortest Hamiltonian pathway
This is also an NP-hard problem...
The Shortest Superstring Problem
Problem:
Given a set of strings,
find a shortest string that contains all of them
Input:
Strings s1, s2,…., sn
Output: A string s that contains all strings s1, s2,…., sn as substrings,
such that the length of s is minimized
Equivalent of:
-finding the shortest Hamiltonian pathway
-TSP
Graph Theory helps DNA assembly
University of Maryland
"Translation" of the problem: a model
Nodes: reads
Edges: connects nodes if the corresponding reads overlap
Example: assembling a bacterial genome
Red lines - wrong assembly
Bold Black lines - good assembly
Assembling the reads = finding the shortest Hamiltonian pathway = TSP = SSP
NP...impossible...?
The Way Out: Constructing and analyzing de Bruijn Graphs
Finding Eulerian paths in the de Bruijn graph can lead to sequence reconstruction
Linear problem!
J. Kaptcianos
Thank You for Your attention!
Second-generation DNA sequencing
"Sequencing by synthesis" methods
(Solexa)
300bp [normal] - 10kb [mate-pair]
(454)
1-10 kb, and 20 kb in expt. stage
DNA Colonies amplified by PCR: “Polonies”
(Solexa)
isothermal extension "bridge PCR"
note: even PCR-free!
(454)
emulsion PCR
fluorescent imaging of the entire array
Reads:
(Solexa): ~50-80
(454): ~200-300
Nature Biotech, vol. 26
Illumina (Solexa) sequencing
Paired-end sequencing
flow cell:
EMBL GeneCore
ABI: capillary electrophoresis sequencing and SoLID
Directed graphs
We assign a certain direction with the edges
The Eulerian Path Problem can be re-formulated accordingly:
Visit each edge 1! while passing along the edges in their direction
Note: Eulerian path might not exist!
Examples:
M. Waterman
kezdet
tenyleg legrovidebb-e
Red: repeats
(also known as Overlap-Layout-Consensus method)
The Way Out: Constructing and analyzing de Bruijn Graphs
directed graph representing overlaps between sequences of symbols
Given sequences of symbols (~reads): ATG, TGG, TGC, GTG, GGC, GCA, GCG, CGT
"k-length fragments" (k=3)
Nodes: fragments of k-1 (k-1=2)
Edges: k-length fragments connecting overlapping vertices
Finding Eulerian paths in the de Bruijn graph can lead to sequence reconstruction
(Superpath problem, Merging transformation, etc.)
Linear problem!
J. Kaptcianos
Download