DNA MOLECULAR TECHNIQUES H92A 35 Outcome 4: Describe the molecular technology of DNA sequencing, microarray analysis, and whole genome sequencing DNA Molecular Techniques: Outcome 4 C. Kyne; GCC; 2020 2 DNA Molecular Techniques: Outcome 4 Sequencing DNA DNA sequencing refers to methods that determine the order (i.e. sequence) of nucleotides in a molecule of DNA. Many methods for DNA sequencing exist but ‘Sanger sequencing’ was one of the first techniques developed for sequencing short DNA fragments (> 600 bp). It is also known as the ‘chain termination’ method and variants of this technology are still used today for: • Single gene studies e.g. identifying disease-causing mutant genes • Genotyping microsatellite markers • Validating results from other/new sequencing methods • Verifying plasmid sequences, inserts, mutations • Human leukocyte antibody (HLA) typing During Sanger sequencing, the DNA to be sequenced acts as the template for DNA synthesis. In fact, the requirements and procedure for Sanger sequencing are similar to those for PCR (outcome 3). For instance, Sanger sequencing requires that a short region (within ~1,000 base pairs) that flanks the 3’ end of the sequence of interest is known before the experiment. From this, a synthetic DNA primer of complementary sequence to one of the strands can be made. The primer can anneal by complimentary base pairing to its recognition site and DNA polymerase will extend the 3’ end of the primer to form a new single DNA strand. From the above description, it is apparent that the following are necessary for Sanger sequencing: • template DNA • DNA polymerase C. Kyne; GCC; 2020 3 DNA Molecular Techniques: Outcome 4 • primer • dNTPs Another key component of the mixture is: • ddNTPs which stands for di-deoxynucleotidetriphosphates and includes the bases ddATP, ddTTP, ddGTP and ddCTP (Figure 1). These are analogues of dNTPs that terminate chain elongation. They differ from dNTPs in that the 3'-hydroxyl is substituted with a hydrogen (Figure 1A). Since the 3'-hydroxyl is essential for phosphodiester bond formation and chain elongation, DNA polymerase cannot add further nucleotides to a ddNTP because it lacks 3'-hydroxyl group. In this way, DNA synthesis is terminated as soon as a ddNTP becomes incorporated into the chain. Sanger sequencing takes advantage of this process and tends to use ddNTPs modified to contain a fluorescent dye (Figure 1B). Each ddNTP has a unique excitation and emission wavelengths, making the corresponding nucleotide easy to detect. During the synthesis step of Sanger sequencing, one of the target DNA strands is incubated with: a labelled primer (complementary to the known 3’ end of the template), DNA polymerase, dNTPs, and ddNTPs. The mixture is then subjected to the denaturation, annealing, and extension steps as per a standard PCR experiment. Note that the concentration of dNTPs is much higher than the concentration of ddNTPs. This means that there is a much higher probability of a dNTP molecule being incorporated into a newly synthesized chain than a ddNTP. C. Kyne; GCC; 2020 4 DNA Molecular Techniques: Outcome 4 DNA synthesis of a new strand will therefore proceed until a ddNTP becomes incorporated. Given enough time, reagents, and a suitable ratio of dNTPs to ddNTPs, at least one DNA strand of every possible length will be produced with a fluorescent ddNTP at the end. Figure 1A. Comparison of the general structures of ddNTPs (top) and dNTPs (bottom) where the former lacks a 3’ hydroxyl group. 1B. Examples of fluorescently labelled ddGTP, ddATP, ddCTP, and ddTTs that could be used for Sanger sequencing. After the synthesis step, each strand is separated on the basis of size using capillary gel electrophoresis. This method uses an electric field to drive DNA through a capillary fibre filled with a gel matrix. This method is sensitive enough to separate DNA fragments that differ in length by a single nucleotide. The smallest fragment will reach the end of the capillary fibre first. A fluorescencedetecting laser, built into the machine, then shoots through the fibre and will excite the fluorescentlylabelled terminal ddNTP of the smallest fragment. The terminal ddNTP can be identified by measuring the wavelength of the emitted fluorescence. The next shortest fragment will then reach the fluorescencedetecting laser and will undergo C. Kyne; GCC; 2020 5 DNA Molecular Techniques: Outcome 4 the same analysis, followed by the next shortest fragment etc. Through this process the sequence of a full length DNA strand is determined. The sequence of the detected fluorescence is then converted computationally into an electropherogram (Figure 2). This is a graph of the intensity of fluorescence emission over time which shows the nucleotide sequence above. Figure 2. A high quality electropherogram. The data for the first 30 bases is unrealiable because of the small fragment size. An animation of automated Sanger sequencing can be found here Mutation Detection by Sanger Sequencing Sanger sequencing is a widely used method for the detection of mutations, especially for single nucleotide variants (SNVs) which occur when a single nucleotide is altered in a DNA sequence (Figure 3). The majority of known diseasecausing mutations occur as a result of SNVs. Studies of these SNVs by sequencing can therefore indicate differences in susceptibility to many diseases (e.g. sickle-cell anaemia, cystic fibrosis, and β-thalassemia). Knowledge of these genetic variations can indicate the severity of the illness that an individual experiences as well as the way the body responds to treatments. Sanger sequencing analysis for mutation detection is based on a comparison: a patient’s electropherogram is compared to that obtained from a DNA sample C. Kyne; GCC; 2020 6 DNA Molecular Techniques: Outcome 4 without a mutation (Figure 3). Any differences between the two sequences are analysed for their potential phenotypic effect. Historically a visual comparison was made for each nucleotide peak in the two traces, but this is time consuming, prone to error and slow. Nowadays, software is used to perform sequencing analysis. Tens of thousands of nucleotides can be analysed in seconds. The software automatically and accurately detects mutations and provides a description of the mutation at the DNA and protein level. Figure 3A. Electropherogram portion from a healthy individual compared to B a congenital glaucoma patient. The black arrows highlight SNVs. Note that the mutations are heterozygous. Mutation Detection by Microarray Microarray technologies can also be used for the detection of gene mutations. A microarray is also known as a “DNA chip” or a “biochip”. It is a small glass plate/slide C. Kyne; GCC; 2020 7 DNA Molecular Techniques: Outcome 4 encased in plastic (Figure 4). Each chip contains thousands of microscopic spots arranged in ordered rows and columns. The precise location and composition of each spot is recorded in database. Each spot contains multiple strands of identical, single-stranded DNA (called “probe” DNA) that represents a portion of a gene. This means that the composition of each spot is unique. Together each portion would add up to the normal gene in question. Other spots may contain regions of the gene that contains previously identified mutations. Figure 4. A DNA microarray To identify whether a patient carries a mutation for a specific disease, a sample of DNA from the patient's blood is collected as well as a control sample that does not contain a mutation in the gene of interest. Both samples are prepared separately but undergo identical processes. The DNA is denatured in order to separate the two complementary strands of DNA. The single-stranded molecules are cut into smaller fragments and subsequently labelled with a fluorescent dye. Different labels are used for the patient and control DNA. For instance, patient’s DNA could be labeled with green dye and the control DNA could be labeled with red dye. Both sets of labeled DNA are then mixed and inserted into the chip. They are incubated to allow for hybridization to occur C. Kyne; GCC; 2020 8 DNA Molecular Techniques: Outcome 4 between complementary strands in the sample and on the chip. After this, excess sample DNA is washed off the chip. DNA hybridization on the chip can be identified by fluorescence measurements (Figure 5), which are performed using a microarray scanner. This instrument is equipped with an autoloader and can analyze large numbers of chips automatically. Figure 5. Comparison of microarray data where a ☼ symbol indicates fluorescence detection due to binding/hybridization between control/patient and probe DNA. Note that the patient and control samples were not premixed. Instead, both samples were individually introduced to separate, but identical chips. If the patient lacks a mutation for the gene then both the red (control) and green (patient) samples will bind to the sequences on the chip that represent the normal sequence (without the mutation). Because the sample DNA is fluorescently labelled both the red and the green dye can be independently measured, even though they C. Kyne; GCC; 2020 9 DNA Molecular Techniques: Outcome 4 are found in the same spots. No fluorescence will be detected in the spots that contain mutated fragments. If the patient possesses a mutation, their DNA will not bind to the DNA sequences on the chip that represent the "normal" DNA and no green fluorescence will therefore be detected in those spots. Instead the mutant DNA will bind to the sequence on the chip that represents the mutated DNA sequence (if known), and green fluorescence will be observed in the corresponding spot. Figure 5, shows microarray data relating to a specific gene from normal (control) and patient DNA. Note that the patient and control samples were not premixed in this experiment. Instead, both samples were individually incubated on separate, but identical chips. Fluorescence was observed in position D3 and A4 of the control chip but not the patient chip. The presence of binding between the probe DNA (immobilized on the chip) and control DNA suggests that the ‘normal’ gene contains a portion that is complementary to the probe DNA in D3 and A4. The lack of binding in positions D3 and A4 of the patient sample indicates that there is a mutation in the corresponding gene portion of the patient. Fluorescence was not observed in position E7 of the control microarray but was observed in the corresponding patient microarray. If E7 contains mutant DNA then this would confirm the presence of that mutation in the patient. Microarrays for Studies of Gene Expression Microarrays can also be used to study the expression levels of genes in cell and tissue samples, making it especially useful in the field of clinical diagnostics. This process is called “gene expression profiling”. Because DNA is transcribed into mRNA, mRNA can be used as an indicator of gene expression. The isolation of C. Kyne; GCC; 2020 10 DNA Molecular Techniques: Outcome 4 mRNA from tissue is therefore the first step in gene expression profiling. One suitable method of mRNA isolation from cell lysates would be to use affinity chromatography with oligo dT beads. mRNA cannot be used directly for microarray studies because microarrays contain single stranded DNA probes. Thus, reverse transcription is necessary to produce cDNA from the mRNA (Outcome 2). After amplification by PCR and fluorescent labelling, the cDNA can be hybridized on the microarray and analysed by fluorescence. Note that the target DNA immobilized in microarray spots are single stranded cDNA molecules that correspond to a large number of different mRNAs. Thus each spot represents a particular gene. C. Kyne; GCC; 2020 11 DNA Molecular Techniques: Outcome 4 Figure 6. Experimental workflow for microarray analysis of control “healthy” cells versus cancer cells. In this example, both control and cancer cDNA was labelled with identical fluorophores and analysed on separate, identical microarray chips. Alternatively, control and cancer cDNA could be labelled with different fluorophores before mixing and analysing using a single microarray chip. C. Kyne; GCC; 2020 12 DNA Molecular Techniques: Outcome 4 Consider the microarray data below (Figure 7), obtained from healthy brain cells and brain tumour cells. From the data, it is apparent that the genes in position D2, G4, and F6 are expressed only in healthy cells while the genes in B1, A2, and I7 are expressed only in tumour cells. Figure 7. Comparison of microarray data where a ☼ symbol indicates fluorescence detection due to binding/hybridization between control or tumour and probe DNA. The method of analysis is closely similar to that used for the data in figure 5 but the hybridization of short DNA fragments gave rise to the fluorescence data in figure 5 while the hybridization of much longer cDNA molecules, corresponding to expressed genes, produced the hybridization data in figure 7. The former is used C. Kyne; GCC; 2020 13 DNA Molecular Techniques: Outcome 4 to detect mutations (Figure 5), the latter is used to detect gene expression (Figure 7). An informative video on gene expression analysis using microarray technology can be found here. Gene Linkage and Recombination Human chromosomes contain hundreds to thousands of genes each. When genes are located close together on the same chromosome we call them “linked” (Figure 8). Linkage therefore refers to the situation in which alleles located on the same chromosome will be inherited together as a unit more frequently than not. Alleles located on the same chromosome are not always inherited together as a unit, however. Recombination occurs, during the first phase of meiosis, when portions of DNA in a homologous chromosome pair crossover, break, and recombine to produce new combinations of alleles (Figure 8). Recombination can therefore disrupt linked genes. Crossovers occur at random positions along the chromosome. The frequency of crossovers between two genes is therefore dependent on the distance between them. Short distances between allelles, constitute a very small target for crossover events. Very few of these events will take place. The further apart alleles are on the same chromosome, the greater the likelihood of them undergoing recombination. These alleles would have a greater recombination frequency than alleles located close together on the same chromosome. C. Kyne; GCC; 2020 14 DNA Molecular Techniques: Outcome 4 Figure 8. A, B, and C genes are partially linked. Examples of two possible recombination events are presented. During prophase I (meiosis) crossing over occurs (at points called ‘chiasmata’) between the maternal and paternal versions of the same chromosome, resulting in the physical exchange of chromosome parts. This is recombination. Recombination contributes to genetic diversity. Additionally, because recombination frequency is related to the physical distance between alleles, quantitative studies of recombination frequencies are also used to estimate the distance between genes and to produce a type of genetic map. This process is called “genetic mapping” or “linkage mapping” and is crucially important in genetics and, as we will find later, genome sequencing. Consider a simple genetic mapping study of the fruit fly Drosophila melanogaster, which is a useful model for genetic studies. This process would begin with linkage analysis, an experiment that quantifies the recombination frequency between a series of gene pairs (or other genetic markers) to identify whether they are linked. This experiment would involve analyzing the frequency at which the corresponding phenotypes occur in a population of offspring of specific genetic crosses. High recombination frequencies indicate a greater the distance between the genes. C. Kyne; GCC; 2020 15 DNA Molecular Techniques: Outcome 4 Let’s consider two Drosophila genes: • the black gene, with a dominant b+ allele that specifies normal, yellow brown body colour and recessive b allele that specifies a black body • The vestigial gene, which has a dominant vg+ allele that specifies normal, long wings and a recessive vg allele that specifies short, "vestigial" wings that are crumpled In order to measure recombination frequency between these two genes, a fly must be constructed in which gene recombination can be observed. The researcher must know specifically which genes are together on the chromosome. A good way to begin is by crossing two homozygous flies i.e. each fly has two of the same allele for body colour and wing type (Figure 9). The resulting double heterozygote offspring has a normal appearance and is a useful starting point for recombination studies because it provides knowledge of which alleles are located on the same chromosome. Figure 9. The generation of a test fly that is heterozygous for the black and vestigial genes (F 1) by crossing homozygous parents (P). C. Kyne; GCC; 2020 16 DNA Molecular Techniques: Outcome 4 The double heterozygote (Figure 9; F1) is then crossed with a fly that is homozygous recessive for body colour and wing type (it contains only b and vg alleles). This is known as a “test cross” because it guarantees that the alleles provided by the double heterozygote fully determine the phenotype of the offspring. This is because the tester fly can only provide recessive alleles. Figure 10 shows all possible offspring phenotypes produced by the test cross including new allele combinations resulting from all possible recombination events. The recombination frequency (RF) between these two genes can be calculated by adding the number of individuals in the population that have non parental phenotypes (those with black bodies and normal wings plus those with yellowbrown bodies and vestigial wings), dividing the answer by the total number of individuals in the population, and multiplying by 100. In this case, the recombination frequency is 17%. This means that these two genes recombine 17% of the time. C. Kyne; GCC; 2020 17 DNA Molecular Techniques: Outcome 4 Figure 10. Results of the test cross between the double heterozygous fly and a homozygous recessive fly. The non-parental phenotypes, which occur due to recombination, are highlighted. Other cross tests can be performed to identify recombination frequencies for other gene pairs. For instance, the cinnabar gene, cn, for eye pigmentation has a recombination frequency of 8% with respect to vestigial wings. If this information were to be presented in a map, there are two possibilities for the position of the cn allele (Figure 11). This is because frequencies are roughly additive i.e. correlate with the fact that genes are arranged in a linear order on a chromosome. C. Kyne; GCC; 2020 18 DNA Molecular Techniques: Outcome 4 Figure 11. Using knowledge of the recombination frequencies of the black and cinnabar genes with respect to the vestigial gene, two potential maps, A and B, can be generated. The only way to confirm which of the above maps is correct is to measure the recombination frequency for the black and cinnabar genes. Map A is correct if the recombination frequency is ~ 25%, while map B is correct if the recombination frequency is ~ 9%. Through experimentation, the recombination frequency for the black and cinnabar genes was determined to be ~ 9%, making map B correct (Figure 9). Similarly the lobe mutation, l (which affects eye structure) was found to have a 5% recombination frequency from the vestigial gene and a 22% recombination frequency from the black gene. Because recombination frequencies are additive, and we can see that 5% + 17% = 22%, this suggests that the l gene does not lie between the black and vestigial genes, it is found on the “far” side of the vestigial gene. Thus, the map can be updated (Figure 12). C. Kyne; GCC; 2020 19 DNA Molecular Techniques: Outcome 4 Figure 12. Map of a Drosophila chromosome which identifies the relative position of the black, cinnabar, vestigial, and lobe genes. It is worth noting that linkage maps display the relative position of genes using standard map units or percentages (1% = 1 map unit or m.u.). It is not a direct measure of the physical distance between genes, it merely provides an approximation of physical distance. Linkage maps do not specify the exact loci on a chromosome or which specific chromosome they are on. Sequencing is necessary in order to produce a physical map. This video presents a worked example of linkage mapping. Uses of Linkage Mapping Linkage mapping is primarily used to understand the genetic basis of disease. For instance, linkage mapping could be used to identify a well-known gene or marker that is linked to a disease-causing allele. If a marker is identified, then knowledge of the chromosome on which the disease-causing allele is located will be obtained. Its precise location on the chromosome may also be identified. This information will form the basis for further studies e.g. sequencing. Linkage maps are also crucially important in genome sequencing. Draft genomes produced through sequencing are actually composed of thousands of individual sequences that must be pieced together to yield the final genome. Often, sequencing alone does not provide information on how these pieces are C. Kyne; GCC; 2020 20 DNA Molecular Techniques: Outcome 4 assembled into chromosomes. At this point, linkage maps act as frameworks for genome assembly. Linkage mapping is also used to validate/correct mistakes in genome sequencing efforts. Microsatellite Markers A genetic marker, or marker, is a gene or other DNA sequence with a known location on a chromosome. It can therefore act as a “landmark” and be used for identification purposes. Genes are not the only class of marker, however. Microsatellites are tracts of DNA that are 1-10 nucleotides in length. These nucleotide tracts repeat from 5-50 times in a row making them an example of “tandem repeats” (Figure 13). Figure 13. Portion of an electropherogram highlighting a microsatellite. Microsatellites occur at thousands of locations within a genome (frequently in noncoding DNA making them biologically silent) and the physical locations of many microsatellites are precisely known. This property makes microsatellites important markers and analytical tools. Specific applications of microsatellite markers include: • • paternity tests forensics C. Kyne; GCC; 2020 21 DNA Molecular Techniques: Outcome 4 • linkage mapping Paternity tests: A child is likely to have similar microsatellites to their mother and father but distinctly different microsatellites to non-relatives. This is because microsatellites have a much higher mutation rate compared to coding DNA sequences and therefore tend to show high variation between individuals. In particular, the number of unit repeats tends to vary meaning that some individuals have longer tandem repeats than others. This makes microsatellites excellent markers for paternity identification. In a typical paternity test, hair or saliva samples are taken from the mother, child, and proposed fathers. From the DNA in each sample, multiple microsatellites are amplified by PCR for separation and detection by agarose gel or capillary gel electrophoresis. The data are compared to identify paternity (Figure 14). Figure 14. Agarose gel electrophoresis of selected, amplified microsatellites prepared from DNA samples of a mother, child and two potential fathers: Charles and Morgan. Morgan is the father. Forensics: DNA profiles such as those for in paternity testing can be generated by amplifying and separating a set of microsatellites from DNA gathered at a crime scene. DNA from suspects can be processed in the same way and C. Kyne; GCC; 2020 22 DNA Molecular Techniques: Outcome 4 compared in order to help identify the perpetrator. Linkage mapping: See notes above on “Uses of Linkage Mapping”. The Human Genome Project The Human Genome Project (HGP) was an international project that produced a fully sequenced human genome that is freely available in online databases. This sequence is a prototypical or “composite” genome of several individuals. Today, the HGP is still the world's largest collaborative biological project. The main aims of the HGP were to: • determine an accurate sequence of the 3 billion DNA base pairs that comprise the human genome • develop new tools for data acquisition and analysis • catalogue all of the estimated 20,000 to 25,000 genes in the human genome • sequence the genomes of other model organisms of medical relevance, e.g. fruit fly and mouse • investigate the consequences of genomic research through its Ethical, Legal, and Social Implications (ELSI) programme. The project officially began in 1990 and by 2003 essentially all of the aims had been met. For more information on the Human Genome Project and its biological and technological implications, read this article. C. Kyne; GCC; 2020 23 DNA Molecular Techniques: Outcome 4 Sequencing in the HGP One major goal of the HGP was to develop new technologies and analytical tools. In particular, Sanger sequencing, which was limited to small DNA fragments, was considered too cumbersome, expensive, and inefficient for use in such a large scale and complex project. However, a modified version of Sanger sequencing was deployed in the HGP. It is called BAC-to-BAC sequencing (Figure 15). The first step of BAC-to-BAC sequencing involves generating a physical map of each chromosome. A physical map requires breaking chromosomes into large fragments and deciphering the order of these fragments in the intact chromosome. No sequencing is required in this step. Instead many copies of the chromosome are randomly cut into fragments of ~150,000 bp in length (Figure 15A). The fragments are then inserted into bacterial artificial chromosomes (BACs). This allows the fragments to be amplified and fingerprinted i.e. analysed for the presence of markers (in this case restriction recognition sites) that could later be used for identification and genome assembly. For instance, some fragments contain common portions (recall that many copies of the same chromosome are randomly fragmented before insertion into BACs) and identification of the restriction sites either side of a common marker can help to piece fragments together in the correct order. Each BAC fragment is then further fragmented into 1,500 bp pieces and placed in another vector called M13 (Figure 15B). This clone is then sequenced. Since the sequence of the M13 vector is already known the fragment sequence is easily identified. The fragment sequences are assembled in the correct order using a computer program which identifies common sequences i.e. regions of overlap (Figure 15C). Once each 150,000 bp fragment is sequenced fully then these fragments can be correctly ordered using the physical map. C. Kyne; GCC; 2020 24 DNA Molecular Techniques: Outcome 4 Figure 15. Schematic of BAC-to-BAC sequencing, an example of a clone-by-clone sequencing technique. Whole Genome Shotgun Sequencing During the course of the HGP advances were made in another method of sequencing: whole genome shotgun sequencing. This method is much quicker than the BAC-to-BAC approach because it bypasses the need to insert fragments into BAC vectors in order to build a physical map. Whole genome shotgun sequencing is also less expensive and requires much less starting material. During whole genome shotgun sequencing multiple copies of the genome are randomly fragmented into ~ 2,000 base pairs (bp) fragments. This is performed by squeezing DNA through a pressurized syringe. Using a similar approach, a fresh C. Kyne; GCC; 2020 25 DNA Molecular Techniques: Outcome 4 sample is fragmented into 10,000 bp pieces. Each fragment (both 2,000 and 10,000 bp types) is inserted into a plasmid. Both resulting 2,000 and 10,000 bp libraries are sequenced. The millions of sequenced fragments are assembled computationally into longer fragments called “contigs” and finally into a continuous stretch of DNA corresponding to each chromosome. To understand how the Human Genome Project has changed genetic research, read the following article. C. Kyne; GCC; 2020 26 DNA Molecular Techniques: Outcome 4 Outcome 4: Revision Questions 1. Give two uses for Sanger sequencing. Single Gene Studies and verifying plasmid sequences (2 marks) 2. List the components used in a Sanger sequencing reaction. Which component is not required in a typical PCR reaction? Describe the role of this component. Template DNA, DNA polymerase, primer and dNTPs, ddNTPs. ddNTPs are not used in typical PCR reactions because 3’ – hydroxyl that is involved in dNTPs are essential in PCR as 3’ – hydroxyl is essential for the phosphodiester bond that is formed during the chain elongation step that is done by DNA polymerase. In ddNTPs, the 3’ – hydroxyl group is substituted by hydrogen. The ddNTPs are used in Sanger sequencing as they can be easily modified to contain a fluorescent dye. (7 marks) 3. Describe the Sanger method of DNA sequencing. One of the DNA strands is incubated with a labelled primer (complimentary to the 3’ end of the template), DNA polymerase, dNTPs, and ddNTPS. The mixture is then exposed to the steps of PCR; denaturing, annealing and extension. (The concentration of dNTPs is higher than the concentration of ddNTPs) Due to there being a higher concentration of dNTPs, there is a higher chance of dNTPs getting incorporated into the newly synthesised chain. DNA synthesis will continue with the new strand until a ddNTP is incorporated into the strand. Over time, if there are enough reagents and suitable ratios of dNTPs to ddNTPs, then at least one DNA strand of every possible length will produce a fluorescent ddNTP at their end. C. Kyne; GCC; 2020 27 DNA Molecular Techniques: Outcome 4 (6 marks) 4. Explain how the newly synthesized DNA fragments are separated and detected during Sanger sequencing. The new strands are separated on the criteria of size using capillary gel electrophoresis. Electric fields are used to drive DNA through the capillary fibre filled with a gel matrix. The method is sensitive enough to separate DNA fragments that differ in length by a single nucleotide. The smallest fragment will reach the end of the complex first and then a fluorescent laser built into the machine shoot through the complex to excite the fluorescent markers of the smallest fragment. The ddNTP can be identified doing this to measure the wavelength of the emitted fluorescence. Through repeating this process with the varying DNA sample sizes, the full sequence of the DNA strand can be determined. (2 marks) 5. What is an electropherogram? A machine that detects fluorescent waves admitted by ddNTPs to plot a wavelength to show the length of DNA fragments to fully determine the length of the synthesised DNA strand. (1 mark) 6. Analyse the electropherogram below. Describe the type of mutation that is apparent. Most of the DNA bases presents on both of the waves are similar, however there are very slight variations between the wave sizes can be seen between the C bases. The most prominent point mutation is between the blue C bases. The C. Kyne; GCC; 2020 28 DNA Molecular Techniques: Outcome 4 wave is much shorter and is the most prominent difference between the 2 waves. The most apparent mutation is point mutation. (2 marks) 7. DNA microarrays can be used for the identification of mutations and for studies of gene expression. Explain the features of a DNA microarray. DNA microarrays are also known as a DNA chips. They are a small plate/ slide that are encased in plastic. The chips contain thousands of microscopic spots that are arranged in ordered rows and columns. Their precise locations are entered into a data base. Each of the spots contain multiple strands of identical, single stranded DNA that represent a portion of the gene. Each composition of the spot is unique. Each of the spots would add up to a normal gene. (4 marks) 8. Explain how a tissue sample must be prepared/processed for a microarray-based study of gene expression. C. Kyne; GCC; 2020 29 DNA Molecular Techniques: Outcome 4 The patient’s blood sample and a control sample with no mutations are taken separately. The DNA in both samples is denatured in order to separate the 2 complimentary strands of DNA. The single stranded molecules are then cut into smaller fragments and then labelled with a fluorescent dye. Various labels are used for the patient and control DNA. Both sets of labelled DNA are mixed and inserted into a chip. Then they are incubated to allow for hybridisation to occur between the complimentary strands in the sample and on the chip. After this, the excess sample DNA is then washed off the chi. The DNA hybridisation on the chip can be identified by the fluorescent measurements which are done by using a microarray scanner. (4 marks) 9. The scheme below shows some of the genes expressed in healthy pancreatic tissue and a pancreatic cancer cells. C. Kyne; GCC; 2020 30 DNA Molecular Techniques: Outcome 4 (a) Which is the control experiment? Healthy Pancreatic tissue sample (1 mark) (b) Identify a well/spot that contains a gene that is only expressed in healthy pancreatic cells. A2 (1 mark) (c) Identify a well/spot that contains a gene that is only expressed in pancreatic cancer cells. B3 C. Kyne; GCC; 2020 31 DNA Molecular Techniques: Outcome 4 (1 mark) 10. What process is responsible for disrupting linked genes during meiosis? Recombination (1 mark) 11. What is meant by the term “recombination frequency”? The closer together the alleles are on the chromosomes, the greater the chance of recombination than those that are further away. (1 mark) 12. What is a linkage map and how does linkage mapping work? A linkage map shows the distance between 2 chromosomes to help estimate recombination frequency. Recombination frequency is linked to the distance between alleles. By studying the distance between recombination frequencies, it can be used to plot a linkage map. (2 marks) 13. Describe two applications of linkage mapping in genomic science. Could be used to identify a well-known gene of marker that could be linked to a disease-causing allele. They are also important in genome sequencing. (2 marks) 14. What is the difference between a linkage map and a physical map? Linkage maps represent the distance between alleles to represent recombination frequencies whilst physical maps are used to identify the order of large fragments before any sequencing can occur. (1 mark) C. Kyne; GCC; 2020 32 DNA Molecular Techniques: Outcome 4 15. Genes A, B, C, and D are located on the same chromosome. The recombination frequencies (RF) are as follows: Genes RF (%) A-B 10 A-C 25 A-D 23 B-C 15 C-D 48 What is the most likely order of genes in the chromosome? C,B,A,D (2 marks) 16. Genes A, B, G, and H are located on the same chromosome. The distances between the genes are below: Genes RF (%) A-H 18 A-B 10 B-H 8 A-G 2 H-G 20 What is the most likely order of genes in the chromosome? H,B,A,G (2 marks) C. Kyne; GCC; 2020 33 DNA Molecular Techniques: Outcome 4 17. What is a microsatellite marker? These are tracts of DNA that are 1-10 nucleotides in length. They occur in thousands of locations within the genome and the physical locations of many of these satellites are used as genomic markers for identical purposes. (2 marks) 18. Describe how microsatellites are used in either: • paternity testing; or • forensics Paternity Test: A child will have similar microsatellites to their parents than to their non- relatives. Microsatellites have a higher rate of mutation than coding DNA sequences therefore it shows a higher variation between individuals. This makes microsatellites perfect for paternity tests as the number of repeating units tends to vary in some individuals resulting in longer tandem repeats than others. A sample from the child is then compared to the mother and the proposed father. From each of the DNA in the sample, multiple microsatellites are then amplified by PCR for separation and then under detection by either an agarose gel or capillary gel electrophoresis. Forensics: DNA profiles can be generated by amplifying and separating a set of microsatellites from DNA gathered at the crime scene. DNA from the suspects can be compared to the perpetrator by the same process. (3 marks) 19. State 4 aims of the Human Genome Project (HGP). To determine an accurate sequence of the 3 billion DNA base pairs that comprise the human genome Develop new tools for data acquisition and analysis Catalogue the genomes of other models organisms of medical relevance Legal, and social implications program (4 marks) C. Kyne; GCC; 2020 34 DNA Molecular Techniques: Outcome 4 20. Describe the method of BAC-to-BAC sequencing used in the HGP. Generate a physical map by breaking down the chromosomes into large chromosomes into large fragments and deciphering the order of the fragments in the intact chromosome. No sequencing is required in this step. Instead, many copies of the chromosome are randomly cut into fragments The fragments are inserted into bacterial artificial chromosomes, allowing for the amplification and fingerprinted. This is used to look for the presence of markers. These markers can later be used to for identification and genome assembly. Each BAC fragment is then further fragments into 1,500 bp pieces and placed in another vector called M13. This clone is then sequenced. Since the sequence for the M13 vector is already known, the fragment sequence is easily identified The fragment sequences are assembled into the correct order using a computer program. These fragments are ordered properly using a physical map. (3 marks) 21. Describe how whole genome shotgun sequencing works. Multiple copies of the genome are randomly fragmented into 2,000 base pairs fragments. This is done by squeezing the DNA through a pressurized syringe. Using a similar approach, a fresh sample is fragments into 10,000 bp pieces. These both result in 2,000 and 10,000 bp libraries being sequenced. The millions of fragments are assembled computationally into longer fragments called contigs. Then they are assembled into a continuous stretch of DNA corresponding to each chromosome. (3 marks) 22. Give two advantages of whole genome shotgun sequencing over BAC-to-BAC sequencing. C. Kyne; GCC; 2020 35 DNA Molecular Techniques: Outcome 4 Whole Genome shotgun sequencing is overall less expensive and requires much less starting material than BAC – to – BAC sequencing. (2 marks) C. Kyne; GCC; 2020 36