Sanger Sequencing and Quality Assurance Zbigniew Rudzki Department of Pathology University of Melbourne Sanger DNA sequencing • The era of DNA sequencing essentially started with the publication of the enzymatic dideoxynucleotide terminator technique by Sanger et al in 1977 – Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain terminating inhibitors. Proc Natl Acad Sci 74: 5463 1977 • It is still a very valuable technique even in the era of next generation sequencing Evolution of Sanger DNA sequencing • There have been three distinct stages in the evolution of Sanger sequencing over the past few decades: – Manual sequencing using PAG, radiolabels and eyeball reading of gels – First generation of gel (PAG) based DNA sequencers using fluorescently labelled terminators and automated base calling. – Latest generation of capillary based DNA sequencers with improved cycling sequencing chemistry and more advanced software for mutation calling • The quality control issues changed as sequencing evolved Principle of Sanger sequencing • Sanger sequencing comprises several steps. – The first step is primer extension across the region to be sequenced – Only a single primer is used so amplification is arithmetic. – The key is that a mixture of nucleotides (majority) and di‐deoxynucleotides (minority) is used in the reaction. – When a di‐deoxynucleotide analogue is incorporated into the growing strand, it terminates further extension – The incorporation of a di‐deoxynucleotide is a random process Chain terminating inhibitors A Deoxyribonucleotide A Di-Deoxyribonucleotide Base Base Phosphate Phosphate Sugar H Sugar H H Principle of Sanger sequencing • An aliquot of DNA contains a large number of templates (individual DNA molecules) • Thus the result of a sequencing reaction is a mixture of extension products of various lengths depending on when a di-deoxy analogue was incorporated. • The extension products are separated according to size by PAG electrophoresis • The order of separated bases is the sequence Principle of Manual Sanger sequencing dd primer C A T GG ACT A A T GGA G T ACC TGA T T AC C TA dd primer C A T GGACT AA G T ACC TGA T T AC C TA dd primer C A T GG ACT A G T ACC TGA T T AC C TA dd primer CAT GGA GT A CC TGAT T A C CTA Components of Manual DNA sequencing * * * * dd A A A A A dd G dd T dd C A C G T A C G T A C G T A C G T ssDNA, enzyme, primer ssDNA, enzyme, primer ssDNA, enzyme, primer ssDNA, enzyme, primer Manual DNA Sequencing • After the sequencing reaction was complete the products were run on a polyacrylamide gel capable of resolving fragments differing by a single base • Four lanes were required for each sequence • The gel plates were removed and the gel was transferred to filter paper and dried • The fragments were then visualised by autoradiography • Finally the sequence was read manually using the eye and a ruler Manual DNA sequencing Only about 80-120 bases could be read per sequencing reaction Automated Gel based DNA sequencing • Automated sequencing became possible because of the development of fluorescent dyes that allowed the simultaneous use of all 4 terminators as they could be distinguished by colour. • This meant that all 4 terminator reactions could take place in one tube and be run in one lane of an automated DNA sequencer • An automated DNA sequencer was essentially a PAG electrophoresis apparatus with a laser at the bottom which could scan the fluorescent bases as they migrated passed Automated DNA sequencing A G Normal nucleotides C T primer 5’ primer 5’ C A T G di deoxy analogues with a fluorescent label C A T GG ACT A A T G G T ACC TGA T T A C C 3’ CAT GGA GT AC C T G A T T A C C 3’ Automated DNA sequencing C T A T C G A G Primer G A T A G C T C Gel for automated sequencing Gel Tracks Gel sequencing trace Automated Gel based DNA sequencing • Because the results from an automated sequencer were in an electronic format, then they could be analysed by computer • Automated sequencers all came with computer programs that called the bases automatically rather than by eyeball Problems with first generation automated Gel based DNA sequencing Variable peak height • A problem with early automated sequencing was variable peak height and thus signal strength • This was largely due to the sequencing chemistry used and the dyes that were coupled to the terminators • Each dye had different absorbance and mobility characteristics • Software was used to correct for this thus giving the appearance of evenly spaced peaks • However this led to some incorrect base calling Structure of (dRhodamine) Dye Terminators dR6G-ddATP dTamra-EO-ddCTP dR110-EO-ddGTP dRox-EO-ddTTP Problems with base calling • Because of problems with base call errors in the early first generation automated sequencers, third party software was developed to improve the accuracy of base calling. • The best known of these was a program called Phred which was developed in the early 1990’s by Phil Green • Phred analysed the raw data and assigned a quality score to each base in the sequence Phred Quality Scores These are logarithmically linked to error probabilities Phred Quality Score Probability of incorrect base call Base call accuracy 10 1 in 10 90% 20 1 in 100 99% 30 1 in 1000 99.9% 40 1 in 10000 99.99% 50 1 in 100000 99.999% Third generation Sanger sequencing • The third generation of Sanger sequencing involved the development of – Capillary instruments which eliminated the need to pour gels as the capillaries were filled with the separation matrix automatically – New dyes and chemistries which largely solved the variable peak height issue. – Cycle sequencing was introduced Capillary sequencing • In these instruments, the separation takes place within a tiny glass capillary • The advantages over gel based sequencing are – No need to pour gels‐ separation matrix loaded robotically – The samples are loaded automatically via a robotic system – No tracking was required Cycle sequencing • Cycle sequencing was also introduced. • This is somewhat equivalent to PCR in that the sequencing reaction was cycled 20‐25 times which greatly improved signal strength • This meant that good sequencing could be obtained from a smaller amount of sample Improvements in DNA sequencing chemistry • Over the years there has also been considerable improvement in the sequencing chemistry formulation – The dyes have been improved to minimise the difference in mobility and peak height between bases – The enzymes have been improved to provide longer reads with a wider variety of templates Structure of the BigDyes (dR110/Fam) (dR6G/Fam) (dTamra/Fam) (dRox/Fam) Applied Biosystems 3730 DNA analyser Second generation capillary sequencer A capillary array Samples on capillary sequencer www.appliedbiosystems .com INCREMENTAL IMPROVEMENT IN SEQUENCING CHEMISTRY Classic BDTv1 BDTv3 BDTv3.1 15 Improved base calling • In addition, the instrument manufacturers greatly improved their base calling software and included the Phred programs into their software. This greatly assisted base calling • Thus automatically improving confidence in the resulting sequence and in the ability to detect sequence variants. Current state of Sanger Sequencing • The evolution of Sanger sequencing over some 20‐30 years has resulted in very reliable sequencing. • The Human Genome Project was completed using Sanger sequencing. • If an individual is having trouble with their sequencing, the problem will most likely rest with the sample Important factors in Sanger sequencing • Cleanliness of the DNA template and reaction product is paramount – The template should be free of proteins, RNA, polysaccharides and genomic DNA. This can best be achieved by using a commercial miniprep – Plasmids, BACs or cosmids are best sequenced from overlapping PCR fragments – After the sequencing reaction, it is very important to remove any unincorporated dyes. If using a precipitation methods ensure you don't lose your template. Spin columns tend to be more reliable Important factors in Sanger sequencing • Use an appropriate amount of template Sample Concentration PCR product‐ 100‐200bp 1‐3 ng PCR product‐ 200‐500bp 3‐10ng PCR product‐ 500‐1000bp 5‐20ng PCR product‐ 1000‐2000bp 10‐40ng SS Plasmid 25‐50ng DS Plasmid 150‐300ng Cosmids and BACS 0.5‐1.0ug Genomic DNA 2‐3 ug Important factors in Sanger sequencing • Using smaller amounts of ready reaction mix – ABI still recommend the use of 8ul of the mix in a 20ul total volume – As the mix is relative expense most people use less than the 8ul – 4ul and even 2ul are commonly used. – However if you want to use even less reagent, it would be better to reduce the volume of the reaction mix proportionally – The automated instrument uses only a smaller fraction of most sequencing reactions Example of good sequencing Reaction failed completely Insufficient or poor quality template and/or primer Primer binding site absent, deleted or mutated Weak signals Insufficient or poor quality template and/or primer Primer mutated or poor primer design Mixed sequences Mixed sequences • Possible causes for mixed sequences – – – – – – – – Mixed plasmid preparations Multiple PCR products Multiple priming sites Multiple primers in mix eg failure to remove PCR primers Primer dimer Frame shift Degraded primer Slippage due to homopolymer or repeat regions in template • There are many web based sites that offer advice on sequencing problems Trouble shooting DNA sequencing • Talk to a colleague who is experienced in sequencing. Practise definitely does make perfect (almost) in sequencing. • Talk to your sequencing service if you use one. They see all sorts failed reactions all the time and generally know what the problem was with yours • Don't blame the sequencing service. Most of the samples that ran with yours were fine so it rarely is the instruments fault. • Supervisors – please train your students/staff. Too often a supervisor tells the student to find out how to sequence from a colleague along the same bench. The colleague was told the same thing the previous year. Summary • The instrumentation and chemistry currently used for Sanger sequencing is highly reliable and reproducible • At present, the major causes of failures in DNA sequencing rest with the sample and the person doing the sequencing and not the instrument or sequencing service • Close adherence to recommended protocols should solve most sequencing problems