Biotechnology Homework 1 Fall 2009 Answers 1. (i) (a) On one chromosome there is only one copy of a specific piece of DNA, so you must detect just a single DNA molecule. (b) You need to recall how to convert mass, moles and molecules, and to approximate some values. A whole human genome has a molecular weight of roughly 3 x 109 x 700 = 2 x 1012. Hence, 5 g has 5 x 10-6 / 2 x 1012 moles Or 5 x 10-6 x 6 x 1023 / 2 x 1012 = 1.5 x 106 copies, or molecules if we consider the whole human genome as one molecule. No matter what the size of DNA in question there would be roughly a million molecules in a single band that might be illuminated by hybridization to a probe. (c) Similar to the calculation above, the number of template molecules is roughly 1 x 10-7 x 6 x 1023 / 700 x 103 = 1 x 1011 In each round of synthesis one template molecule can only produce one product molecule (and in fact the efficiency of template use will be somewhat lower than 100%, but still pretty high. Although the procedure is called cycle sequencing there is no template amplification, just repeated rounds of copying the same 100ng of original template. Hence, the maximum number of product molecules from 20 cycles would be 2 x 1012 These products must be distributed amongst about 1,000 bands (whenever you are attempting to read 700 nt or so), and in fact there is bound to be an excess of full-length copies. So, again, the maximum number of molecules in a single band will be about 2 x 109 (and in reality that figure might be as much as ten-fold lower)- something like a billion. Very few answers acknowledged that important products were spread over about 1,000 bands. Hence, any one crucial piece of information (a band or peak) will only represent about 0.1% of the total product. (ii) The most striking disparity is between FISH and sequencing but FISH and Southerns are in some ways more similar procedures and still a million-fold apart in required sensitivity. The most important difference is that the FISH signal is confined to a very small area (perhaps something like 0.1 m x 0.1 m in real space; a bright signal makes it appear larger) compared to the others (perhaps 0.1mm x 0.5 mm)- a difference (very roughly estimated) in the neighborhood of a million. Thus, a signal can be much lower in FISH to achieve the same intensity. When taking this kind of argument to the extremes one has to consider also the absolute amount of fluorescence and photon detection efficiency, but here one can use a probe that has a very large number of fluorophores to hybridize to a single target molecule. (iii) The longer the region of target DNA for which you have labeled probe the stronger the signal. Of course, the probe must be labeled to high specific activity also to increase the signal. In practice, whether one or several DNA templates are used to generate a probe the probe will be made as, or broken down into, relatively short sections. Thus, each fragment has some chance of hybridizing to a given chromosome during FISH and the signals you see come from the hybridization of many probes to nearby regions of the chromosome. If you had one long probe you would often see nothing, and occasionally see a jackpot hybridization event with a strong signal. 1 I am personally a little surprised at how many molecules are in a sequencing band (at least according to these calculations assuming very efficient template usage). Probes for Southern blots are often around 1kb or more and are made to high specific activity by using a labeled dNTP. Hence, there may be towards 2-300 labels per probe molecule, compared to only one labeled ddNTP in a sequencing product molecule. That accounts for a difference of a hundred-fold or more in the sensitivity of detection between these two procedures (but not a million-fold). When Southern blots were popular they generally used radioactivity, for which the sensitivity of detection (for bands) may be greater than fluorescence, but I do not know the relevant numbers. I do know that when old-fashioned sequencing transitioned from incorporating radioactive label (for example, from 32P-dATP at several positions in one sequencing product) to using one fluorescent ddNTP it was necessary to introduce multiple cycles of sequence to produce enough product to be readily detectable. Many answers did not emphasize, explain or mention the importance of a long probe simply to generate more signal. Instead, there was a lot of discussion about specificity of hybridization. There are regions in a human genome that are exact or near-exact repeat sequences. If those repeats come from a variety of chromosomal positions and you include the repeat sequence in your probe you will indeed see hybridization at more than one site. The longer your probe the more chance of including a repeat. However, the solution to this is not to use short probes but to find and remove repeat sequence (i.e use multiple pieces of DNA from the region in question, but none with repeats). Ignoring repeats there is no problem with specificity. 300bp is long enough to provide absolute specificity in hybridization (so is even 50bp) even in the complex human genome at high stringency, and the specificity remains the same (exceedingly high) even if the probes collectively include 200kb. Also, in case anyone was worried, it is equally good to talk about annealing or hybridization of two nucleic acid strands to each other. 2. (i) (a) After PCR the tube will contain the starting genomic DNA, the PCR product, excess primers that were inevitably used, Taq or similar polymerase, dNTPs and buffer. The main components to consider are nucleic acids. Genomic DNA should not be a problem. A sequencing primer is typically at least 20nt in order that it would have several mis-matches with random (or in this case genomic) DNA. Thus, it is easy to find an annealing temperature where only a very small proportion of hybridizing primer, hybridizes to inappropriate sites. Even if there is, by chance, a really good match at one genomic location there are so few copies of that locus in the reaction tube that the specific products from there will be negligible. Hence, genomic DNA would be expected to contribute only a very small background of bands, roughly equally distributed among ddA, G, C and T colors at each size. The primers will be a problem. Sequencing must be from only one site. Even if you chose to use one of the PCR primers for sequencing you must get rid of the other (a reasonably strong second sequence imposed on a primary sequence makes a mess; although theoretically possible to resolve, the demand for top quality sequence makes that objective unattractive). Hence, PCR products are generally cleaned up prior to sequencing by passing them through some form of sizing column, where large DNAs pass through and are collected, while primers and low molecular weight compounds are still traveling through. The polymerase, dNTPs and buffer components are not the main issue here. All fresh solutions and enzymes are nice and undoubtedly have some impact on product quality but a portion of the PCR products could easily be diluted straight into sequencing reactions with more polymerase and suitable quantities of dNTPs and ddNTPs if this were the only issue. 2 Many answers did not acknowledge the presence of genomic DNA and address whether this was a problem or not. Also, some answers talked generically about contaminants. While it is not necessary to enumerate every “contaminant” it is important to think about what you include under that heading and which contaminants propose what kinds of problems. In other words, be precise in your thoughts and writings. (b) The problem (as made very clear in Q1) is that you won’t have anything like enough of the desired template. Basically, for the human genome, making 1g of a PCR product of 3kb from the same mass of a genome of 3 x 106 kb is a million-fold enrichment. It would not be feasible to use 100mg of genomic DNA in a volume like 10l, as used for sequencing- far too sticky. If it were feasible, a second problem would now emerge of considerable background priming from primer mis-matches to the, now, huge concentration of genomic DNA segments. (ii) If you had 8 fluorophores you could use two tubes (4 fluorophores each) for two different templates and then combine into one lane. The idea that might occur for 5 fluorophores is to use the 5th fluorophore by attaching it to the primer used for one template (but not the other primer). Although initially you might think that you have thereby created two sets of uniquely labeled sequencing products, the problem is that you do not see the results molecule by molecule but rather band by band. Thus, you cannot identify a molecule as starting with labeled primer and ending with ddGTP, for example. Instead, of course, every single band will have some product from the labeled primer and will include that fluorophore, providing no extra information at all. Two super-imposed sequences cannot be separated into the two components. That would also be true if one were theoretically at much lower concentration because by-products (random pauses etc.) from the stronger sequence would introduce significant background signals into the weaker sequence. 3 (i) (a) The hybridization (and washing steps) was likely conducted at low stringency. This can be done by reducing the temperature and/or increasing salt concentration from standard conditions. Some answers mentioned zoo blots. That is OK but the name itself does not contribute to answering the question. Also, a typical zoo blot has DNA from many species (not just yak) to see if a probe sequence is generally conserved (& such experiments are very rare now given full sequences of many representatives of different phyla and species). Some answers talked about controlling the timing of altered hybridization conditions. I don’t believe it is common or useful to change conditions during a hybridization experiment. (b) No, for many reasons. DNA is mostly irreversibly bound to membranes used for Southern blots. Even in a narrow slice of (fresh, not transferred) gel with cleaved genomic DNA there will be a huge number of contaminating bands and way too low a concentration of the desired product to be of any use. One could envisage PCR-amplifying a yak DNA fragment by guessing which sequences might be best conserved and designing suitable PCR primers, but if the Southern blot looked good the simplest idea is just to use the mouse DNA probe for the yak Northern blot under similar conditions of reduced stringency. It is true that this approach may be problematic. If the gene in question is poorly expressed the signal may be very low (whereas in a genomic DNA Southern equal representation of different sequences [other than repeats] is guaranteed). Also, if a somewhat related sequence is abundant (may 3 even apply to rRNA) that could give a spurious signal under reduced stringency. A realistic problem is that it would be hard to design a control to prove that whatever you see is the correct transcript. Hardly any answers stated the obvious deficit (to me) of insufficient and impure DNA in the 3.2kb region (so the question was worthwhile). Instead, a variety of lesser concerns not directly relevant to the idea of probe quality were raised. One concern was that part of the 3.2kb might correspond to intron. Even if only 500bp corresponds to exon you should be able to get a good signal. If the whole region were intron it probably would not be well conserved and the 3.2kb band would not be identified in the first place. It is common to use genomic DNA fragments to probe Northern blots, although I concede that cDNA probes can be better. Another concern was that the relevant mRNA might be absent or at low concentration. True, but that means a Northern is doomed no matter what the probe. Third, some had concerns about DNA hybridizing to RNA and wanted to make RNA probes instead. DNA-RNA hybridization is perfectly fine. Having failed to identify the main problem, most solutions were necessarily inappropriate. Some included cloning steps, but incidentally rather than as a way to improve DNA yield and purity for a probe. (ii) (a) Isolate DNA (probably from blood cells), cut DNA and run a gel. Transfer and probe duplicates with (a) the normal oligo sequence and (b) the specifically altered oligo. As controls for each hybridization include DNA (PCR product or from a plasmid in very low amounts so the signal is similar to the genomic DNA blots) that is normal and DNA that has the specific mutation (although normal genomic DNA samples can suffice). Hybridize at moderate stringency and wash with increasing stringency (several trials may be necessary unless the conditions have already been established), so that under at least one condition correct matches give much stronger signals than mismatches (revealed by the controls). It is important to use both normal and altered oligos to distinguish heterozygous, homozygous and no mutations, to be sure that precisely the desired mutation is being tested and to provide some measure of duplication, affording a reliable result. The controls are similarly crucial to be sure how to interpret what you see. The sensitivity of a direct Southern with oligo probes is just sufficient. It might, however, be better and more convenient to PCR amplify the region around the suspected mutation, spot products onto a membrane and hybridize to those dot-blots. Most answers talked about high stringency, which is good enough. However, I think the answers assumed it is trivial to distinguish between perfect & imperfect matches as a black and white issue. It is actually quite a fine distinction and really requires controls to show you that the chosen conditions do (or do not) achieve that distinction. A few answers did not talk about mis-matches at all, but instead different stabilities due to G-C vs A-T content and looking for detailed melting profiles. That is missing the main point, but also those distinctions are extremely small. The purpose of diagnostic tests is to get clean results that are definitive. Hence, small quantitative differences are never attractive options. (b) The final sequence would have two overlapping peaks of similar magnitude at a single position (for a mutant heterozygote). If sequence quality is high this should stand out easily from the random background peaks (but seeing normal samples in parallel is extremely helpful in being convinced of an abnormality). Both alleles are of course expected to be amplified similarly by PCR. 4 Many responses simply refused to describe the requested data. It is important to get used to the idea that data and interpretation are two distinct things that must be distinguished and kept separate. The purpose of these questions is to make you think about the real experimental experience. You see data and then try to make deductions strictly from the data. (c) When two super-imposed sequences diverge at one point by a nucleotide deletion you will see a clean sequence up to that point and then a variety of single or dual peaks that could actually be ascribed as consistent with a precise change (when you know the normal sequence- note that is different from deducing unknown sequences from two super-imposed sets of sequencing peaks). (d) The sequence output above would be reasonably clear but certainly would not be good evidence that there is only a single base-pair deletion (other aberrations including larger deletions and inseretions would give results that look broadly similar) and is not in itself crystal clear. An improvement could be made by sequencing the DNA from the other end. That would show that either side of the lost bp the sequence of all amplified templates are the same. An alternative approach would be to clone the PCR products and sequence enough copies of each type of DNA molecule present (in pure form) to be sure that both forms are represented. Many answers opted for a solution by oligo hybridization (presumably because you had already described such experiments in (i)). That is OK but it is not the best approach in terms of speed or clarity of results. Some suggested doing this as a FISH experiment. I think that is impractical because FISH requires long probes to generate a sufficient signal and discriminating hybridization by two oligos is difficult to achieve under the other constraints of FISH (including amplification of a binding signal due to a single molecule). (e) There is an established technique called ARMS-PCR (or ASA, allele-specific amplification). The key point is that DNA polymerase with no exonuclease activity requires perfect base-pairing of a primer at its 3’ end. This is a very stringent requirement leading to black and white results (much better than comparing hybridization of oligos with a perfect match versus a single mis-match). So, two PCR reactions are attempted. Each has a perfectly good PCR primer downstream of the site in question. The second PCR primer anneals precisely to the sequence that is being interrogated so that its 3’ end hybridizes to the nucleotide that may be different. This second primer is made to include either normal sequence or the hybridization partner of the precise change being tested. Hence, one primer amplifies only normal DNA and the other amplifies only mutant DNA. While differential hybridization of primers could theoretically be used it provides far less discrimination of mis-matches and, unfortunately once you generate some PCR products you will now have perfectly matching templates for future rounds, so PCR would be much worse than simply testing oligo hybridization if mis-matches were positioned anywhere other than the 3’ end. I expect several students knew of the correct method or came across it in texts or discussions, but if anyone actually figured the correct answer out without such prompts I would be very impressed (& you missed out on a patent opportunity by a few years). 4. (i) One concern would be whether any bands are hidden or hard to see. The easiest place to hide in plain sight is for a band to co-migrate with another. Gels have limited resolving power, so, for example, two bands of approximately 3.0kb might well co-migrate even if they differ in length by up to about 100bp. Such co-incidences are not so likely for this small molecule, but they are inevitable 5 when examining much larger cloned DNAs. You can tell if there is co-migration (or at least strongly suspect it) by paying attention to band intensities. Ethidium bromide (and other dyes that may be used) bind evenly along DNA, so the band intensity reflects the number of bp present, or mass of DNA. If the three bands all come from the same molecule they should be equimolar and hence band intensities should be proportional to size. Co-migration is revealed by bands of twice or more the expected intensity (easy to judge by eye). The second place where bands disappear is if they are very small. Such bands stain poorly and diffuse more readily, so they are weaker and broader. That problem is worst for low percentage agarose gels and for slowly run gels but could easily affect fragments of 300bp or smaller in typical gels. The best way to address this possibility is to look at several other digests (and sometimes a variety of double digests). You can sum the bands for each type of digest and look for consistency (no missed fragments) or the largest (only that digest produced all bands large enough to notice). Some answers suggested that different gels might better resolve co-migrating bands. That is true to some degree but for differences less than 50bp or so agarose gels will have trouble (worse if the fragments are large) and polyacrylamide is more work with only a small chance of benefit. It was suggested that looking at uncut DNA or DNA cut just once (by chance) would be especially helpful. Full-length linear DNA can be sized quite well but big fragments are inevitably measured less accurately than smaller ones (& how do you know there is only one cut). Circular molecules migrate anomalously (nicked & supercoiled migrate very differently from each other and linear DNA of the same size), are not well separated from each other & hence are very poor choices for size determination. Some students thought very large fragments might not enter the gel. Protein associations might trap DNA but generally pure DNA of any size enters a gel even if it is not fractionated well. Also, a plasmid is necessarily relatively small, so no huge DNA fragments will be present. (ii)(a) The fact that the second band is 2.8kb should alert you to the idea that this could represent the sum of 1.0 and 1.8kb bands. That band could be present if it is not cut in the middle by EcoRI. That could happen if the DNA were not fully digested (perhaps the DNA prep. was not as pure as useful or the batch of enzyme was getting old- should not happen often, but does occur frequently enough to be realistic). In that scenario one would expect other EcoRI sites to be incompletely cut and to see the 1.0kb region still associated with its neighbor on the other side (and perhaps even with two neighboring regions where two sites were not cut). That is consistent with the reported results. It is possible that multiple regions of the genome have considerable sequence identity to the 1.0kb fragment and are hybridizing to the probe. If the sequence similarity is very high this could happen even at high stringency but it is likely that 1.0kb would be the strongest band (so intensities might favor or dis-allow this possibility). A surprisingly large number of students took this and the rest of the question to relate to digestion of plasmid DNA (the question is clear that genomic DNA is digested). Accordingly, most results were rather hard to explain or the answers were, at least, different. The most salient advice is simply to read the questions very carefully (and to attend recitation so there is less chance of such misunderstandings). (b) The arguments here are similar except there are only two bands, as if the EcoRI site separating 1.0 and 1.8kb regions were the only site not being cut (some of the time). That could be because it is absent in the DNA (perhaps a single nucleotide change, a SNP, within the EcoRI site- a restriction 6 fragment length polymorphism [RFLP] in the Southern). In fact, for the observed pattern the donor mouse would have to be heterozygous for that specific SNP. Among individual humans and fruitflies (so likely quite similar for mice) there is roughly one SNP per kb, so finding one in a specific 6bp target (the EcoRI site in question) would certainly not be expected but is a reasonably frequent occurrence. As above, a second region of sequence similarity to the probe is possible (although the 2.8kb size coincidence and bands of equal intensity would certainly make the first explanation more likely). (iii) (a) Since you see two HindIII bands you assume that the 1.0kb probe spans these two fragments and hence that they are adjacent (here we are ignoring the possibilities in (i), as instructed). BamHI does not appear to affect the 1.2kb fragment, implying that no BamHI sites lie within that fragment. The 4.7kb fragment is reduced to 0.8kb, implying a BamHI site 0.8kb from the end of the 4.7kb fragment that is adjacent to the 1.2kb fragment (if it were at the other end of the 4.7kb fragment the 1.0 kb probe would not hybridize to the 0.8kb fragment). The above argument is the simplest but is not definitive without some further tests. Specifically, it could be that the HindIII 1.2kb fragment is being cut by BamHI to give an 0.8kb fragment and the 4.7kb fragment is cut by BamHI to produce a 1.2kb fragment (the same size as the original HindIII fragment just by co-incidence). It is fine for you to ignore this second possibility (the two could be resolved by further digests, including BamHI only). (b) Going with the first explanation, you know that the 1.0kb EcoRI fragment overlaps to some degree with the HindIII 1.2kb fragment and the Bam-HindIII 0.8kb fragment. If the 1.0kb extended significantly beyond the BamHI site (away from the 0.8kb fragment) you would see a third band illuminated by the probe (the possibility of small unseen fragments and overlaps insufficient for a strong signal mean you cannot be certain that an EcoRI site is within the 0.8kb fragment but that is most likely. The most important undisclosed information is the intensity of the 1.2kb band relative to the 0.8kb or 4.7kb bands (which should be similar to each other). The relative intensity will reveal what proportion of the 1.0kb probe lies either side of the central HindIII site. That should allow moderately good positioning, which, of course, could be refined by BamHI-EcoRI and HindIII-EcoRI digests if that is important to ascertain. In fact, the extent to which signal intensity gives more information depends on the way the probe is prepared. If made, for example, by random priming you will in fact have many short labeled singlestranded fragments (typically around 300nt) from the 1.0kb region. If two-thirds of the 1.0kb fragment overlaps the 0.8kb BH fragment roughly two-thirds (actually more) of probe molecules can hybridize to that fragment (compared to one third for the 1.2kb fragment). Hence, the relative intensities would be roughly 2:1. If, however, the probe is made by PCR and contains only 1.0kb fragments, each fragment can hybridize to 0.8Kb or 1.2kb fragments and in doing so will bring with them the same amount of radioactivity. Provided each overlap is reasonably long (about 300bp or more) the efficiency of hybridization (rate of collisions leading to stable association) would also be very similar so the final band intensities would be much the same. In other words, this type of probe will only give you information if one of the overlaps between probe and target is very small (less than 300bp). 7 8