DNA

advertisement
DNA sequencing: Basic idea
Background: test tube DNA synthesis
• DNA polymerase (a natural enzyme) extends
2-stranded DNA over a 1-stranded template
primer extension
polymerase
5’ TTACAGGTCCATACTA 
3’ AATGTCCAGGTATGATACATAGG 5’
Template
• Can buy DNA polymerase and do this in a
tube.
Quicktime
animation
cse587A/Bio 5747: L2 1/19/06
1
DNA sequencing, cont
cse587A/Bio 5747: L2 1/19/06
2
DNA sequencing, cont
cse587A/Bio 5747: L2 1/19/06
3
DNA sequencing, cont
cse587A/Bio 5747: L2 1/19/06
4
Quicktime
animation
cse587A/Bio 5747: L2 1/19/06
5
Modern Sanger sequencing
Dye terminator sequencing
•
•
•
•
Flourescent label on terminator, not primer
Different colors for ddA, ddC, ddG, ddT
Run all 4 reactions in a single lane
Image under 4 colors of laser
Capillary electrophoresis
• Each sequence is sized thru a separate, thin
tube (capillary)
• Avoids lane tracking errors
Automated readout -- Phred
cse587A/Bio 5747: L2 1/19/06
6
Limitations of technology
• Error prone, especially at beginning & end
–But Phred estimates error probability
• Not useful beyond 500-800 bp
cse587A/Bio 5747: L2 1/19/06
7
Whole chromatogram (trace)
cse587A/Bio 5747: L2 1/19/06
8
Start of trace
cse587A/Bio 5747: L2 1/19/06
9
End of trace
cse587A/Bio 5747: L2 1/19/06
10
Base calling, assembly, editing
Software tools
• PHRED calls bases from traces. Reads.
–Estimates error probability for each base
(quality values)
• PHRAP assembles reads a longer sequence
–Uses quality values
–Not intended for whole-genome assembly
• Research on assembly algorithms is ongoing
cse587A/Bio 5747: L2 1/19/06
11
Sequencing Genomes
Michael Brent
Dept. of Computer Science
Washington University
cse587A/Bio 5747: L2 1/19/06
12
Why sequence a genome?
Cool technology
Infrastructure for molecular science
• E.g. Cloning & studying a gene of interest
• “Parts list for the human body”
Genome science
• Evolution and dynamics of genomes
Medicine
• Genomic causes of disease and health
cse587A/Bio 5747: L2 1/19/06
13
Which genomes?
cse587A/Bio 5747: L2 1/19/06
14
How can I sequence a genome?
Shotgun sequencing: simple version
1. Cut your DNA at random locations
2. Get ~700-800 bp of sequence from the end
of each fragment: AAGTCGTGGG….
3. Use overlapping sequences to reassemble
cse587A/Bio 5747: L2 1/19/06
15
Step 1: cutting & cloning
A. Cut/break the DNA
•
•
Physical shear – put it in a blender, or
Restriction digest
B. Separate fragments by size & select
cse587A/Bio 5747: L2 1/19/06
16
1C. Clone select fragments
Quicktime
animation
cse587A/Bio 5747: L2 1/19/06
17
2. Sequence random clones
• Pick a clone containing copies of 1 insert
from the plate
• Separate the plasmids from the cells
• Sequence the inserts using primers
complementary to the vector
cse587A/Bio 5747: L2 1/19/06
18
3. Assemble fragments
Idea
• Common end sequences may indicate
overlap in original sequence
overlapping shotgun sequences
…CTGACTAAGTCAUGTTACAG
TTACAGCAGGTATGATA…
assembled sequence
…CTGACTAAGTCAUGTTACAGCAGGTATGATA…
cse587A/Bio 5747: L2 1/19/06
19
3. Assemble fragments
Problems
• Sequencing error may obscure true overlap
• Common end sequences can occur by chance
• Repeats: DNA of higher eukaryotes contains
many copies of nearly identical sequences
–This means overlaps are often from different
copies of the same repeat element
–Repeats are the major issue in sequencing
• Polymorphism
cse587A/Bio 5747: L2 1/19/06
20
Genome assembly
Challenge
• Can’t assemble sequencing reads based on
overlapping ends in long repeats
…CTGACTAAGTCAUGTTACAG
TTACAGCAGGTATGATA
• Overlaps may be from different repeat copies
• Leading to large-scale misassembly
• Polymorphic mismatches may prevent good
joins
cse587A/Bio 5747: L2 1/19/06
21
Single-molecule sequencing
•
Since ~2007, we can sequence
individual molecules without cloning
1. Many molecules are attached to a
surface and copied, forming a cluster
of identical templates
2. Reversible dye terminators are incorporated according to templates (1 bp)
3. Slide is imaged sequentially under 4
color lasers, showing which dye was
incorporated at each cluster
cse587A/Bio 5747: L2 1/19/06
22
Single-molecule sequencing
4. Terminator is cleaved off and 2nd-strand
synthesis continues for next cycle
• Each cycle is one position in the sequence
• 108 50 nt reads / 2-day run (Solexa)
• 106 400 nt reads / 5-day run (454)
• For Sanger, ~103 700 nt reads / day
• Read-length vs. throughput tradeoff
cse587A/Bio 5747: L2 1/19/06
23
Download