King's Next-Generation Sequencing Meeting Michelle Lupton Additional references Linnarsson S. Recent advances in DNA sequencing methods - general principles of sample preparation. Exp Cell Res. 2010 May 1;316(8):1339-43. Epub 2010 Mar 6. Review. PubMed PMID: 20211618. Buehler B, Hogrefe HH, Scott G, Ravi H, Pabón-Peña C, O'Brien S, Formosa R, Happe S. Rapid quantification of DNA libraries for next-generation sequencing. Methods. 2010 Apr;50(4):S15-8. Review. PubMed PMID: 20215015.-the use of real time PCR Stages in the library preparation Steps accompanied by numbers are those for which we suggest alternatives to the standard Illumina protocols. Numbers correspond to those given in Supplementary Protocols Fragmentation Nebulization Uneconomical distribution of fragment size. Approximately half of the DNA vaporises Adaptive Focused acoustics – Covaris Acoustic energy is controllably focused into the aqueous DNA sample by a dish-shaped transducer, resulting in cavitatin events within the sample. 17% of the sample is in the 200bp size range, and little DNA loss Fragmentation Enzymatic digestion (Linnarsson 2010) Two recent commercial enzymatic fragmentation kits were introduced. Fragmentase (New England Biolabs) - based on V.vulnificus nuclease that generates random nicks, and modified T7 endonuclease that recognises the nicks and cleaves the opposite strands. Nextera (Epicentre) - based on random transposon insertion. Also introduces adapter sequences simultaneously with fragmentation. A-tailing, ligation and size selection Artefacts from standard library prep; 1. 3. Bias in base composition High frequency of chimeric sequences Imperfect insert size distribution Overcome by; 1. Pair-end oligos Gel extraction-melt slice at room temp-reduces GC bias Improved efficiency of the end repair and A-tailing Double size selection Paired end size selection-only excise a 2mm size gel slice 2. 2. 3. 4. 5. Figure 3 A-tailing, ligation and size selection GC plots before (a) and after (b) optimisation of gel extraction. The figures show the total area in which reads with a particular GC content are distributed, with the mean and standard deviation. The greater width of shaded area in plot a) indicates a wider dispersion of coverage for all values of GC content for which sequences were obtained. Agilent traces Bioanalyzer 2100 traces for two suboptimal libraries c) 60bp insert library, with optimised PCR, d) the same 60bp library with excess DNA in PCR e) 200bp insert library, showing shoulder of small fragments. Insert size distribution from sequenced human DNA using f) the standard and g) modified paired end library prep protocols PCR Template quality -use optimized quantities of DNA template. Use of high fidelity polymerases in an optimised reaction. Use of solid phase reversible immobilization SPRI technology (SPRI) removes a higher proportion of primers and adapter dimers than spin columns. Reduce the number of PCR cycles: 3ng DNA and 14 cycles of PCR amplification for single end libraries, 25ng DNA and 12 cycles for high complexity libraries, and 10ng DNA and 18 cycles for lower complexity samples. These quantities give the optimal compromise between clean libraries and a low frequency of duplicate sequences. Possible to eliminate the PCR step by ligating on appropriate adaptors after Atailing. Direct sequencing of short amplicons. Figure 4 PCR a) A ~200bp fragment library was prepared, and 10ng was amplified for 18 cycles using standard Illumina conditions, and with more optimal PCR conditions. b) After PCR we divided the library into two: half was purified following the standard Illumina protocol, through a Qiaquick PCR cleanup column, whereas the other was purified using SPRI technology. Each was then run on an agarose gel alongside a 100bp ladder to view the DNA species that remained. PCR LIBRARY KPOA0006 KPOC0002 KPOA0005 KPOC0001 NA18507 KPOC0005 KPOA0010 KPOA0008 KPOC0004 KPOC0003 KPOA0017 KPOA0014 PCR duplication example; UNPAIRED READS EXAMINED 62519 57432 45961 51485 21791 54337 25997 48848 38812 201355 74130 52506 READ PAIRS EXAMINED 5464702 3448312 3143590 2884859 2187848 3648741 2449286 3474580 2350763 4528357 3097782 3493530 UNMAPPED READS 986509 624078 562785 547339 369343 681663 410855 628848 396988 1027539 600782 618672 UNPAIRED READ DUPLICATES 47544 33414 21530 32763 15030 47073 17482 33125 24710 137213 48333 37310 READ PAIR DUPLICATES 2657808 542256 351362 596494 707778 2802655 711382 1017855 528626 2028173 1114520 1206707 READ PAIR OPTICAL DUPLICATES 15731 16503 9954 10627 7613 14976 8187 10306 8809 23747 15594 11793 PERCENT DUPLICATION 0.487918 0.160759 0.114359 0.210567 0.325319 0.768841 0.292461 0.295632 0.228246 0.452963 0.363235 0.348136 ESTIMATED LIBRARY SIZE 3598454 10024617 13316441 6055683 2620318 858549 3376826 4734070 4462098 3410538 3218521 3829677 Pre hybridisation PCR cycles 5 5 5 5 5 6 5 5 5 5 5 5 Post hybridisation PCR cycles 16 12 16 12 12 12 12 12 12 12 12 12 Quantification Optimal concentration range of DNA that will yield clusters in the optimal density range. Spectrophotometry is not accurate. From [bp] 200 Corr. To [bp] Area 1,000 232.6 % of Total 79 Average Size [bp] 375 Size distribution in CV [%] Conc. [pg/µl] 24.1 Molarity [pmol/l] 165.15 714.3 Quantitative PCR. Quantify unknown libraries against standard libraries that have been sequenced previously for which cluster number is known. Electrophoresis with Agilent bioanalyser -Gives a check of size distribution. -Can be inaccurate for a small proportion of libraries, may be due to single stranded DNA not easily quantified when mixed with double stranded -Can use the bioanalyser to check size distribution and Fluorometery to determine the concentration more accurately (e.g. Qubit dsDNA BR Assay) Quantification a) Cluster throughput as a function of total clusters for 200 and 500bp inserts. The 500bp inserts underwent fewer cycles of cluster amplification (28, compared to 35 for the 200bp libraries), resulting in smaller clusters, and so a cluster density of 40-44k / tile (GA1) will produce the maximum yield from either insert size. b) Standardisation of cluster density with qPCR quantification. Runs were grouped into 25-run bins and a boxplot plotted. After some initial problems with degradation of standards, cluster number has levelled out at ~35-40k / tile. Denaturation For low concentrations of Double stranded DNA denaturation by heating can damage DNA and introduce G+C bias. Use Modified hybrization buffers; prefer use of 0.1NaOH to heating. Subnanomolar libraries require an alternative buffer. Addition of Tris to illumina buffer prevents rise in pH. Diluting supplied 2M NaOH and using a greater volume reduces fluctuation caused by pipetting error. 1. 2. Denaturation a) pH titration of hybridisation buffers. The concentration of NaOH in DNA templates is 0.1M NaOH. Adding more than 8μl of this denatured template to the 1ml of Hybridisation Buffer prior to loading DNA onto the flowcell, increases the pH to above 10. This prevents efficient hybridisation, and thus the cluster density falls. The addition of Tris-HCl pH7.3 to the supplied bottles of Hybridisation Buffer dramatically increases buffering capacity, making template hybridisation more robust. b) the addition of 5mM Tris-HCl pH 7.3 to Illumina Hybridisation Buffer allows a greater volume of denatured template to be added before high pH prevents effective annealing of templates to the oligos on the flowcell surface. This increases the robustness of cluster generation, by counteracting pipetting errors in the denaturation step. Amplification Quality control After cluster amplification double stranded DNA on the flow cell can be stained using an intercalating dye to be detected by a fluorescence microscope. Use on flow cells before linearization and blocking to confirm that the cluster density is appropriate. Additions to the method Careful DNA quantification before fragmentation and checking for degraded DNA. Use of low absorbing plastic ware (Linnarsson 2010), e.g Beckman Coulter “non stick” or equivalent. Also advise to add some detergent (e.g. 0.02% Tween-20) to reduce absorption to tube walls. The implementation of SPRI XP beads for all purification steps. The use of the bioanalyser to check concentration and size distribution after fragmentation. Cheaper alternatives to illumina kits, e.g. NEB kits, making own adapters and primers. Conclusion The Genome Analyzer is a powerful sequencing technology, Here the authors describe a number of modifications that allow for more efficient library preparation, and which enable a stable workflow in a production environment. At the Sanger Institute, they have several teams for every stage of sequencing. All steps in the process are recorded using custom-written labtracking and run-tracking database software. Combined with improvements to the image analysis software and a faster run time, they predicted that by Christmas 2008, their output will reach 6-10 terabases of high-quality sequence per year - equivalent to 180 human genomes at 15-fold coverage, or approximately 200,000 bases per second. The improved workflow and high yield should maintain the Genome Analyzer as their next-generation sequencing platform of choice for the immediate future. But how long this remains true depends upon the performance of existing rival technologies, and those that are on the horizon. For example Oxford Nanopore Technologies, and Pacific Biosciences’ Single Molecule Real Time technology which promise to bring us closer to the eagerly anticipated $1,000 genome.