How Accurate is Heterozygote Base Calling in Dye-Terminator Sequencing? GR Taylor1,2, LA Ellis1, MD Robinson1, RS Charlton1, RF Mueller1, MA Knowles2, DT Bishop2 Regional DNA Laboratory1 and ICRF Mutation Detection Facility2, St James's University Hospital, Leeds LS9 7TF, UK Introduction Recent developments in dye terminator chemistry1 have led to the suggestion that, with their reported improvement in accuracy and inherent simplicity of use, they could render fourcolour dye primer sequencing obsolete. This would have significant impact on mutation detection strategies both for candidate gene and diagnostic applications. At present quality control criteria for diagnostic sequencing are lacking, but given the increasing medical use of DNA sequencing this is likely to become a medical-legal issue of great importance. In this study we investigated dye terminator sequencing with the new Big-Dye terminators using a set of cloned mismatch templates as well as amplicons from human genomic DNA (p53, VHL, CFTR and a series of cloned sequence variants). Plasmids were either sequenced directly from minipreps or re-amplified prior to sequencing. Sequence runs of over 800 bases were achieved. The best resolution of longer fragments was obtained with LongRanger gels. After 500 bases the reliability of base calling began to fall. We conclude that reliable heterozygote detection by dye terminator sequencing in sequence runs of up to 500 bases is possible, but that the sequence should be generated from both strands for the detection of heterozygous point mutations. Aims Explore the limits of diagnostic quality sequencing using cloned control and diagnostic examples. Determine the maximum read length for reliable heterozygote calling using slab and capillary gels Verify the sequence of the ATCC mismatch clone series. Investigate the detection of heterozygotes when present as a minority using reconstruction experiments. Identify quality control criteria for diagnostic DNA sequence. Methods Sequencing used the ABI Big-Dye terminator kit according to the manufacturer's protocol except that volumes were halved and "Half-term Big-Dye" diluent (Genpak Ltd) was added at an equal volume. Cycle sequencing was 30 cycles of 95oC (5 seconds), 53oC (5 seconds) and 72oC (4 minutes). After a final extension time of 10 minutes the samples were cooled to 4oC for up to 15 hours. Samples were prepared for sequencing by ethanol precipitation at room temperature in the presence of sodium acetate. The precipitated DNA was redissolved in 95% formamide, 1% dextran blue in TBE and loaded onto 32 cm well to read gels. Gels were either 4% 19:1 acrylamide:bis or Hydrolink 4.25% “Singels” both in TBE with urea. Electrophoresis used 1x run speeds with the temperature set to either 51 or 45oC. For capillary sequencing, samples were resupended in template suppressant buffer (TSB) and loaded without dextran blue. The following sequences were analysed: PCR products derived from the plasmid pJD series of cloned mismatches (ATCC Accession No 87584); exons from the human p53, CFTR and VHL genes. Sequence analysis used the ABI analysis software and alignment of sequences was performed using “Align” in DNAstar according to the Wilbur and Lipman algorithm or by Clustal in Sequence Navigator. Mutation detection using Big Dye terminators VHL Wild type 522 delTG p53 T>A called correctly in one direction, but missed on reverse strand Deletions are easy to detect because of the downstream effects CFTR M470V missed on the forward stand, but called on the reverse strand. Setting base calling to detect all putative heterozygotes generates more miscalls because it is not sensitive the context of the sequence mismatches. Mismatch control plasmids When the plasmid series was grown and amplified one of the insertion clones (pJDTid2), labeled above, gave PCR products that were larger than expected. The sequence of 500 bases around the mismatch is therefore being checked in all of our ATCC stocks. So far the C and A clones have been confirmed to have the correct sequence. Row 1 ; A, row 2 C, row 3 A/C heteroduplex mismatch on forward and reverse strands. Forward sequence (nt121) called as C, reverse sequence (nt313) called as T. Sequencing heteroduplexes Mismatch detection in control plasmid amplicons pJDC Mixture pJDA The reverse strand was more difficult for the software to call, the G at this position was weak even in the homozygote, reliable detection of this mutation requires sequencing the reverse complement. Mismatch was visible (though not called by the software) in the mixture when the primer was 50 bases away but 340 bases away, the weak G signal was lost. Sequencing in only one direction would have missed this heterozygote. Neighboring sequence profile was affected by the base change, making reliable automated base calling difficult. In this case, although a cytosine peak is seen in the mixture, it is not reported by the software. There is a trade-off in adjusting the signal:noise ratio to give useful base calls yet not missing minor peaks in heterozygotes. PCR amplicons generated from plasmids identical except at one base position enable sequencing performance characteristics to be evaluated systematically Sequence read length is limited by gel resolution Apart from miscalls right at the beginning of the sequence, sequencing an amplicon of ɭ,000 bases revealed only 2 errors in the first 650 bases. The first error was due to an “A” called as “N” because of a background “T” peak. Background “T” peaks have been seen on several occasions. The second error was a missed “C” peak, weak after 2 “G”s. The error rate increased rapidly after 690 bases, mostly due to the poor resolution of peaks in this region of the gel. 390 bases 650 bases 100 bases 1000 bases Discussion Big Dye terminator chemistry lacks the precision of dye primer sequencing. However the greater convenience of terminator chemistry makes it attractive for routine use, particularly for fragments of 500 bases and less. Although heterozygotes are visible by eye, it is very difficult to set the basecalling sensitivity to call heterozygotes without miscalling background peaks. Reliable heterozygote detection by dye terminator sequencing is possible in sequence runs of up to 500 bases, but the sequence should be generated from both strands for the detection of heterozygous point mutations and the sequence should still be visually inspected. Whilst it is possible to sequence plasmid DNA beyond 800 bases using Big Dye terminators, we found that PCR products rarely read beyond 600 bases, after which base calling errors started to appear. Reading beyond 600 bases loses accuracy for two reasons: weaker signals and poor gel resolution. Signal strength could be improved by increasing the cycle number and loading more sample. Hydrolink gel was superior in our hands to 19:1 acrylamide:bis, giving increased base separation and longer reads. Further improvements may be possible using 48cm gels instead of the 36cm gels used in this study. Gel integrity was better preserved by setting the temperature to 45oC rather than the default 51oC. We found little advantage in column purification to remove dye terminators, although unincorporated dye terminator peaks were reduced through the use of half-term diluent in the sequencing mix. Sequencing using the 310 61 cm capillary give equivalent resolution at 500 bases to the 377 using Hydrolink gels; larger fragments were not analysed in this study. Summary The progress in sequencing the human genome means that in the immediate future there will be an increasing demand to re-sequence genes, both for diagnostic (mutation identification), epidemiological and candidate gene studies. There is a clear need for quality assessment of sequencing systems, particularly for diagnostic applications. The detection of heterozygotes has been recognized as a special case in point3. We observed a number of context-dependent sequencing errors, which would be difficult to check manually on a large scale and which may be difficult to deal with using default settings using Factura or Phred. One solution is to directly compare sequences with a known standard using a peak subtraction algorithm 4. This may quickly identify sequence anomalies, increasing the throughput of sequence checking. References : 1 Rosenblum, BB; Lee, LG; Spurgeon, SL; Khan, SH; Menchen, SM; Heiner, CR; Chen, SM (1997) New dye-labeled terminators for improved DNA sequencing patterns Nucleic Acids Research 25 4500-4504 2 Deeble VJ; Roberts, E; Robinson, MD; Woods, CG; Bishop, DT; Taylor, GR (1999) Comparison of enzyme mismatch cleavage and chemical cleavage of mismatch on a defined set of heteroduplexes. Genetic Testing 1 1-8 3 MN Kronick “Heterozygote sequencing using automated DNA sequencing technology” in Laboratory Methods for the Detection of Mutations and Polymorphisms in DNA (Ed GR Taylor) CRC Press 1997 4 Bonfield, J. K., Rada, C., and Staden, R. Automated detection of point mutations using fluorescent sequence trace subtraction. Nucleic Acids Research 26, 3404-3409. 1998.