molecular_dna_sequencing_templates

advertisement
Molecular Biology: DNA sequencing
Molecular Biology:
DNA sequencing
Author: Prof Marinda Oosthuizen
Licensed under a Creative Commons Attribution license.
SEQUENCING OF LARGE TEMPLATES
As we have seen, we can obtain up to 800 nucleotides from a single template. But how do we obtain
the sequence of a template that is longer than 800 bp? The following will be discussed:

Primer walking - for sequencing a relatively small piece of DNA (>1 – 4 kb)


Subcloning – when you need to sequence larger templates (> 4 kb)
Genome sequencing
Primer walking
Sequencing by primer walking is an effective strategy for obtaining the sequence of DNA templates
ranging in size from 1 - 4 kb. Primer walking is illustrated in Figure 5.
For a cloned template, the initial sequence data is obtained using a standard primer which hybridizes
to the vector sequence upstream of the insert DNA. For a PCR product, one of the amplification
primers can be used to obtain the initial sequence data. A sequencing primer is then designed
towards the 3’ end of this initial sequencing data. This oligonucleotide is used to prime a second
sequencing reaction on the same template DNA. The data obtained from the second reaction will
overlap with the initial data and extend the sequence further downstream.
By repeated cycles of oligonucleotide design and DNA sequencing, the cloned insert is sequenced
completely in one direction. The same strategy is used to sequence the other strand beginning with a
second standard primer which primes in the other direction. The complete DNA sequence can then be
compiled from all the smaller sequences.
1|P a g e
Molecular Biology: DNA sequencing
Figure 5: Primer walking to obtain the full length sequence of a template that is too large to sequence in
a single sequencing reaction.
Subcloning
It is not efficient to sequence larger templates (> 4 kb) by primer walking. Such larger DNA templates
must first be broken up into smaller overlapping fragments which can be cloned into a vector for
sequencing. These subclones should overlap, so that the individual DNA sequences will themselves
overlap. The overlaps can be then located, either by eye or using a computer, and the master
sequence gradually built up. There are several ways of producing the subclones. The DNA could be
cleaved with two different restriction endonucleases, producing one set of fragments with say Sau3A
and another with AluI. However, this method suffers from the drawback that the restriction enzyme
sites may be inconveniently placed and individual fragments may be too long to be completely
2|P a g e
Molecular Biology: DNA sequencing
sequenced. Often four or five different enzymes will have to be used to clear up all the gaps in the
master sequence. (For more details on cloning see the Cloning techniques course notes).
An alternative method is to use shotgun cloning (Figure 6), where the DNA template is randomly
sheared and cloned into a vector, producing a shotgun library. Clones from this library are sequenced
and the individual sequences assembled, until the final finished sequence of the original template is
obtained. One drawback of this technique is that some regions of the DNA template may be easier to
clone than others, resulting in areas in the final sequence assembly which are over-represented while
other regions are under-represented or represented by reads on only one strand. In fact, there may
be regions where there are no subclones at all, and this would result in gaps in the master sequence.
Such regions would be targeted for further sequencing. Numerous strategies, including the use of
PCR, have been developed for the closure of gaps in the sequence.
Figure 6: Shotgun sequencing strategy.
Genome sequencing
New automated sequencing technologies and improved computer assisted methods for sequence
assembly have greatly increased the speed of sequencing, and have made sequencing of whole
genomes a reality. By sequencing the entire genome of an organism, all of its genes can be identified.
3|P a g e
Molecular Biology: DNA sequencing
Scientists hope that by deciphering the genome sequence of an organism they will gain
understanding of how the organism functions, and how its genes work together to direct its growth
and development. In the case of disease-causing organisms, genome sequences may reveal genes
responsible for virulence and pathogenesis and it may be possible to identify novel vaccine candidate
genes or new drug targets. It is hoped that the human genome sequence may shed light on
genetically-related diseases.
Most genomes are far too big to be inserted into a vector and must be broken into smaller bits in order
to be cloned. For example, bacteria have genomes that range from 1 Mb up to 10 Mb, and some are
even larger. How do we sequence such a huge piece of DNA? There are two strategies for
sequencing whole genomes: the hierarchical approach and the whole genome shotgun
approach. See Figure 7 for a schematic comparison.
In HIERARCHICAL SEQUENCING, the genomic DNA is first broken down into manageable sized
pieces (approximately 100 kb), and these pieces are cloned separately into a cloning vector that will
accept large inserts of 100 kb to 3 Mb (e.g. BAC, P1 or YAC). It is then possible to physically order
these large clones, resulting in several clones which, between them, span the entire length of the
genome. This is known as a “tiling path” (a minimal subset of overlapping clones that span the whole
genome). Ideally the fewest possible clones are identified to make up the tiling path (there are
numerous ways of physically ordering the clones, which are beyond the scope of this course. You can
read about these methods elsewhere if you are interested). Although the genomic DNA has been
fragmented, the size of the piece of DNA in each clone is still too large to sequence directly
(remember it is only possible to sequence around 800 bp at a time). So each individual large clone is
then sequenced separately, usually by breaking it up into little pieces (approximately 2 kb) and
cloning these small fragments of DNA into a sequencing vector. This process is known as sub-cloning
(cloning a cloned piece of DNA!). The sequences of many small 2 kb subclones are obtained and
assembled to give the sequence of the original large (approximately 100 kb) clone. Since the order of
the large clones in the tiling path is known, the complete genome sequence is obtained by assembling
the overlapping sequences of these clones. Hierarchical sequencing was the basis of the publicly
funded Human Genome Project.
The WHOLE GENOME SHOTGUN APPROACH does not make use of ordered subclones. Instead,
the entire genome is fragmented into small clones which are sequenced and assembled. Shotgun
sequencing was the basis of the privately funded Human Genome Project, and today most small
genomes (e.g. bacterial genomes) are sequenced using this strategy.
4|P a g e
Molecular Biology: DNA sequencing
Figure 7: Strategies for sequencing whole genomes: hierarchical sequencing versus the whole genome
shotgun approach.
5|P a g e
Download