Molecular Biology: DNA sequencing Molecular Biology: DNA sequencing Author: Prof Marinda Oosthuizen Licensed under a Creative Commons Attribution license. SEQUENCING OF LARGE TEMPLATES As we have seen, we can obtain up to 800 nucleotides from a single template. But how do we obtain the sequence of a template that is longer than 800 bp? The following will be discussed: Primer walking - for sequencing a relatively small piece of DNA (>1 – 4 kb) Subcloning – when you need to sequence larger templates (> 4 kb) Genome sequencing Primer walking Sequencing by primer walking is an effective strategy for obtaining the sequence of DNA templates ranging in size from 1 - 4 kb. Primer walking is illustrated in Figure 5. For a cloned template, the initial sequence data is obtained using a standard primer which hybridizes to the vector sequence upstream of the insert DNA. For a PCR product, one of the amplification primers can be used to obtain the initial sequence data. A sequencing primer is then designed towards the 3’ end of this initial sequencing data. This oligonucleotide is used to prime a second sequencing reaction on the same template DNA. The data obtained from the second reaction will overlap with the initial data and extend the sequence further downstream. By repeated cycles of oligonucleotide design and DNA sequencing, the cloned insert is sequenced completely in one direction. The same strategy is used to sequence the other strand beginning with a second standard primer which primes in the other direction. The complete DNA sequence can then be compiled from all the smaller sequences. 1|P a g e Molecular Biology: DNA sequencing Figure 5: Primer walking to obtain the full length sequence of a template that is too large to sequence in a single sequencing reaction. Subcloning It is not efficient to sequence larger templates (> 4 kb) by primer walking. Such larger DNA templates must first be broken up into smaller overlapping fragments which can be cloned into a vector for sequencing. These subclones should overlap, so that the individual DNA sequences will themselves overlap. The overlaps can be then located, either by eye or using a computer, and the master sequence gradually built up. There are several ways of producing the subclones. The DNA could be cleaved with two different restriction endonucleases, producing one set of fragments with say Sau3A and another with AluI. However, this method suffers from the drawback that the restriction enzyme sites may be inconveniently placed and individual fragments may be too long to be completely 2|P a g e Molecular Biology: DNA sequencing sequenced. Often four or five different enzymes will have to be used to clear up all the gaps in the master sequence. (For more details on cloning see the Cloning techniques course notes). An alternative method is to use shotgun cloning (Figure 6), where the DNA template is randomly sheared and cloned into a vector, producing a shotgun library. Clones from this library are sequenced and the individual sequences assembled, until the final finished sequence of the original template is obtained. One drawback of this technique is that some regions of the DNA template may be easier to clone than others, resulting in areas in the final sequence assembly which are over-represented while other regions are under-represented or represented by reads on only one strand. In fact, there may be regions where there are no subclones at all, and this would result in gaps in the master sequence. Such regions would be targeted for further sequencing. Numerous strategies, including the use of PCR, have been developed for the closure of gaps in the sequence. Figure 6: Shotgun sequencing strategy. Genome sequencing New automated sequencing technologies and improved computer assisted methods for sequence assembly have greatly increased the speed of sequencing, and have made sequencing of whole genomes a reality. By sequencing the entire genome of an organism, all of its genes can be identified. 3|P a g e Molecular Biology: DNA sequencing Scientists hope that by deciphering the genome sequence of an organism they will gain understanding of how the organism functions, and how its genes work together to direct its growth and development. In the case of disease-causing organisms, genome sequences may reveal genes responsible for virulence and pathogenesis and it may be possible to identify novel vaccine candidate genes or new drug targets. It is hoped that the human genome sequence may shed light on genetically-related diseases. Most genomes are far too big to be inserted into a vector and must be broken into smaller bits in order to be cloned. For example, bacteria have genomes that range from 1 Mb up to 10 Mb, and some are even larger. How do we sequence such a huge piece of DNA? There are two strategies for sequencing whole genomes: the hierarchical approach and the whole genome shotgun approach. See Figure 7 for a schematic comparison. In HIERARCHICAL SEQUENCING, the genomic DNA is first broken down into manageable sized pieces (approximately 100 kb), and these pieces are cloned separately into a cloning vector that will accept large inserts of 100 kb to 3 Mb (e.g. BAC, P1 or YAC). It is then possible to physically order these large clones, resulting in several clones which, between them, span the entire length of the genome. This is known as a “tiling path” (a minimal subset of overlapping clones that span the whole genome). Ideally the fewest possible clones are identified to make up the tiling path (there are numerous ways of physically ordering the clones, which are beyond the scope of this course. You can read about these methods elsewhere if you are interested). Although the genomic DNA has been fragmented, the size of the piece of DNA in each clone is still too large to sequence directly (remember it is only possible to sequence around 800 bp at a time). So each individual large clone is then sequenced separately, usually by breaking it up into little pieces (approximately 2 kb) and cloning these small fragments of DNA into a sequencing vector. This process is known as sub-cloning (cloning a cloned piece of DNA!). The sequences of many small 2 kb subclones are obtained and assembled to give the sequence of the original large (approximately 100 kb) clone. Since the order of the large clones in the tiling path is known, the complete genome sequence is obtained by assembling the overlapping sequences of these clones. Hierarchical sequencing was the basis of the publicly funded Human Genome Project. The WHOLE GENOME SHOTGUN APPROACH does not make use of ordered subclones. Instead, the entire genome is fragmented into small clones which are sequenced and assembled. Shotgun sequencing was the basis of the privately funded Human Genome Project, and today most small genomes (e.g. bacterial genomes) are sequenced using this strategy. 4|P a g e Molecular Biology: DNA sequencing Figure 7: Strategies for sequencing whole genomes: hierarchical sequencing versus the whole genome shotgun approach. 5|P a g e