Sanger Sequencing and Quality  Assurance Zbigniew Rudzki Department of Pathology

advertisement
Sanger Sequencing and Quality Assurance
Zbigniew Rudzki
Department of Pathology
University of Melbourne
Sanger DNA sequencing
• The era of DNA sequencing essentially started with the publication of the enzymatic dideoxynucleotide
terminator technique by Sanger et al in 1977
– Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain terminating inhibitors. Proc Natl Acad Sci 74: 5463 1977
• It is still a very valuable technique even in the era of next generation sequencing
Evolution of Sanger DNA sequencing
• There have been three distinct stages in the evolution of Sanger sequencing over the past few decades:
– Manual sequencing using PAG, radiolabels and eyeball reading of gels
– First generation of gel (PAG) based DNA sequencers using fluorescently labelled terminators and automated base calling. – Latest generation of capillary based DNA sequencers with improved cycling sequencing chemistry and more advanced software for mutation calling
• The quality control issues changed as sequencing evolved Principle of Sanger sequencing • Sanger sequencing comprises several steps. – The first step is primer extension across the region to be sequenced
– Only a single primer is used so amplification is arithmetic.
– The key is that a mixture of nucleotides (majority) and di‐deoxynucleotides (minority) is used in the reaction.
– When a di‐deoxynucleotide analogue is incorporated into the growing strand, it terminates further extension
– The incorporation of a di‐deoxynucleotide is a random process
Chain terminating inhibitors
A Deoxyribonucleotide
A Di-Deoxyribonucleotide
Base
Base
Phosphate
Phosphate
Sugar
H
Sugar
H
H
Principle of Sanger sequencing
• An aliquot of DNA contains a large number of
templates (individual DNA molecules)
• Thus the result of a sequencing reaction is a
mixture of extension products of various lengths
depending on when a di-deoxy analogue was
incorporated.
• The extension products are separated according
to size by PAG electrophoresis
• The order of separated bases is the sequence
Principle of Manual Sanger sequencing
dd
primer
C A T GG ACT A A T GGA
G T ACC TGA T T AC C TA
dd
primer
C A T GGACT AA
G T ACC TGA T T AC C TA
dd
primer
C A T GG ACT A
G T ACC TGA T T AC C TA
dd
primer
CAT GGA
GT A CC TGAT T A C CTA
Components of Manual DNA sequencing
*
*
*
*
dd
A
A
A
A
A
dd G
dd
T
dd
C
A C
G T
A C
G T
A C
G T
A C
G T
ssDNA,
enzyme,
primer
ssDNA,
enzyme,
primer
ssDNA,
enzyme,
primer
ssDNA,
enzyme,
primer
Manual DNA Sequencing
• After the sequencing reaction was complete the products were run on a polyacrylamide gel capable of resolving fragments differing by a single base
• Four lanes were required for each sequence
• The gel plates were removed and the gel was transferred to filter paper and dried
• The fragments were then visualised by autoradiography
• Finally the sequence was read manually using the eye and a ruler Manual DNA sequencing
Only about 80-120
bases could be read per
sequencing reaction
Automated Gel based DNA sequencing
• Automated sequencing became possible because of the development of fluorescent dyes that allowed the simultaneous use of all 4 terminators as they could be distinguished by colour.
• This meant that all 4 terminator reactions could take place in one tube and be run in one lane of an automated DNA sequencer
• An automated DNA sequencer was essentially a PAG electrophoresis apparatus with a laser at the bottom which could scan the fluorescent bases as they migrated passed
Automated DNA sequencing
A
G Normal
nucleotides
C
T
primer
5’
primer
5’
C
A
T
G
di deoxy analogues
with a fluorescent
label
C A T GG ACT A A T G
G T ACC TGA T T A C C
3’
CAT GGA
GT AC C T G A T T A C C
3’
Automated DNA sequencing
C
T
A
T
C
G
A
G
Primer
G
A
T
A
G
C
T
C
Gel for automated sequencing
Gel Tracks
Gel sequencing trace
Automated Gel based DNA sequencing
• Because the results from an automated sequencer were in an electronic format, then they could be analysed by computer
• Automated sequencers all came with computer programs that called the bases automatically rather than by eyeball
Problems with first generation automated Gel based DNA sequencing
Variable peak height
• A problem with early automated sequencing was variable peak height and thus signal strength
• This was largely due to the sequencing chemistry used and the dyes that were coupled to the terminators
• Each dye had different absorbance and mobility characteristics
• Software was used to correct for this thus giving the appearance of evenly spaced peaks
• However this led to some incorrect base calling Structure of (dRhodamine) Dye Terminators
dR6G-ddATP
dTamra-EO-ddCTP
dR110-EO-ddGTP
dRox-EO-ddTTP
Problems with base calling
• Because of problems with base call errors in the early first generation automated sequencers, third party software was developed to improve the accuracy of base calling. • The best known of these was a program called Phred
which was developed in the early 1990’s by Phil Green
• Phred analysed the raw data and assigned a quality score to each base in the sequence
Phred Quality Scores
These are logarithmically linked to error probabilities
Phred Quality Score
Probability of incorrect base call
Base call accuracy
10
1 in 10
90%
20
1 in 100
99%
30
1 in 1000
99.9%
40
1 in 10000
99.99%
50
1 in 100000
99.999%
Third generation Sanger sequencing
• The third generation of Sanger sequencing involved the development of – Capillary instruments which eliminated the need to pour gels as the capillaries were filled with the separation matrix automatically
– New dyes and chemistries which largely solved the variable peak height issue.
– Cycle sequencing was introduced
Capillary sequencing
• In these instruments, the separation takes place within a tiny glass capillary
• The advantages over gel based sequencing are
– No need to pour gels‐ separation matrix loaded robotically
– The samples are loaded automatically via a robotic system
– No tracking was required
Cycle sequencing
• Cycle sequencing was also introduced.
• This is somewhat equivalent to PCR in that the sequencing reaction was cycled 20‐25 times which greatly improved signal strength
• This meant that good sequencing could be obtained from a smaller amount of sample Improvements in DNA sequencing chemistry
• Over the years there has also been considerable improvement in the sequencing chemistry formulation
– The dyes have been improved to minimise the difference in mobility and peak height between bases
– The enzymes have been improved to provide longer reads with a wider variety of templates
Structure of the BigDyes
(dR110/Fam)
(dR6G/Fam)
(dTamra/Fam)
(dRox/Fam)
Applied Biosystems 3730 DNA analyser
Second generation capillary sequencer
A capillary array
Samples on capillary sequencer
www.appliedbiosystems .com
INCREMENTAL IMPROVEMENT IN
SEQUENCING CHEMISTRY
Classic
BDTv1
BDTv3
BDTv3.1
15
Improved base calling
• In addition, the instrument manufacturers greatly improved their base calling software and included the Phred programs into their software. This greatly assisted base calling
• Thus automatically improving confidence in the resulting sequence and in the ability to detect sequence variants.
Current state of Sanger Sequencing
• The evolution of Sanger sequencing over some 20‐30 years has resulted in very reliable sequencing.
• The Human Genome Project was completed using Sanger sequencing.
• If an individual is having trouble with their sequencing, the problem will most likely rest with the sample
Important factors in Sanger sequencing
• Cleanliness of the DNA template and reaction product is paramount
– The template should be free of proteins, RNA, polysaccharides and genomic DNA. This can best be achieved by using a commercial miniprep
– Plasmids, BACs or cosmids are best sequenced from overlapping PCR fragments
– After the sequencing reaction, it is very important to remove any unincorporated dyes. If using a precipitation methods ensure you don't lose your template. Spin columns tend to be more reliable
Important factors in Sanger sequencing
• Use an appropriate amount of template
Sample
Concentration
PCR product‐ 100‐200bp
1‐3 ng
PCR product‐ 200‐500bp
3‐10ng
PCR product‐ 500‐1000bp
5‐20ng
PCR product‐ 1000‐2000bp
10‐40ng
SS Plasmid
25‐50ng
DS Plasmid
150‐300ng
Cosmids and BACS
0.5‐1.0ug
Genomic DNA
2‐3 ug
Important factors in Sanger sequencing
• Using smaller amounts of ready reaction mix
– ABI still recommend the use of 8ul of the mix in a 20ul total volume
– As the mix is relative expense most people use less than the 8ul – 4ul and even 2ul are commonly used.
– However if you want to use even less reagent, it would be better to reduce the volume of the reaction mix proportionally
– The automated instrument uses only a smaller fraction of most sequencing reactions
Example of good sequencing
Reaction failed completely
Insufficient or poor quality template and/or primer
Primer binding site absent, deleted or mutated
Weak signals
Insufficient or poor quality template and/or primer
Primer mutated or poor primer design
Mixed sequences
Mixed sequences
• Possible causes for mixed sequences
–
–
–
–
–
–
–
–
Mixed plasmid preparations
Multiple PCR products
Multiple priming sites
Multiple primers in mix eg failure to remove PCR primers
Primer dimer
Frame shift
Degraded primer
Slippage due to homopolymer or repeat regions in template • There are many web based sites that offer advice on sequencing problems
Trouble shooting DNA sequencing
• Talk to a colleague who is experienced in sequencing. Practise definitely does make perfect (almost) in sequencing.
• Talk to your sequencing service if you use one. They see all sorts failed reactions all the time and generally know what the problem was with yours
• Don't blame the sequencing service. Most of the samples that ran with yours were fine so it rarely is the instruments fault. • Supervisors – please train your students/staff. Too often a supervisor tells the student to find out how to sequence from a colleague along the same bench. The colleague was told the same thing the previous year. Summary
• The instrumentation and chemistry currently used for Sanger sequencing is highly reliable and reproducible
• At present, the major causes of failures in DNA sequencing rest with the sample and the person doing the sequencing and not the instrument or sequencing service
• Close adherence to recommended protocols should solve most sequencing problems
Download