NAME: KEY FINAL EXAMINATION MOLECULAR GENETICS AND BIOTECHNOLOGY BIOC6063 May 18, 2005 (1:00-3:00, Room 425C) Notes: You have two hours to complete the exam. There are 10 questions, so aim for spending no more than 10-12 minutes on each question. Do not feel obligated to fill the space provided for answers. Please return your completed course evaluation (unsigned). 1. A student gets an expression clone from a lab coworker, does expression in E. coli and recovers an excellent yield of the protein. The coworker then provides a clone of the same gene in the same expression vector that has been subjected to in vitro mutagenesis to change a specified encoded residue. The mutation has been confirmed by DNA sequencing. By the same expression protocol, the student is unable to recover any of the mutagenized protein. At the suggestion of the supervising professor, a long series of unsuccessful experiments is conducted changing expression conditions and methods of recovery of the sought after protein. Finally it is discovered that the cause of the problem is that the insert is backwards in the second clone. What simple initial characterization by the student would have prevented him from being victimized by this blunder? One should establish a restriction enzyme digestion pattern for each important DNA and use it to verify the identity of subsequently produced samples. For a plasmid, the enzymes used should be chosen so that the orientation of the insert is obvious from the pattern. Most commonly, this exercise will save you from being victimized by mislabeled tubes, miscommunications, and naturally arising deletions. This simple preliminary characterization would have told you up front that the "mutant" clone was compromised. As several students pointed out, a single sequence read across a vector/insert junction would also conveniently confirm orientation. The restriction pattern, however, detects a broader range of things that could have gone wrong in one simple experiment. This question is modeled after an actual event. The lab mate did the mutagenesis in some other plasmid, and then transferred the insert to the expression clone. He actually thoroughly characterized the DNA after the mutagenesis, and correctly characterized the orientation of clones produced during the transfer. Then someone either mislabeled a tube or misread a hand-written label, and you get handed the wrong material. The moral of the story is that no matter how wonderful the material is according to someone else's notebook, you always have the obligation to verify the nature of materials with which you do your experiments. I'd also say that long before changing expression conditions, I'd redetermine the DNA sequence through the N terminus, translation controls, and back through the promoter to be sure all of these were unaltered. As an aside, after establishing the foundation stock (either the DNA stock or glycerol stock that is supposed to last the next 4 years), I'd get DNA from it and reconfirm the mutation by sequencing. Absolutely the worst thing that can happen to you in this experiment is to inadvertently get the wild type clone, and then spend years studying this supposed "mutant" that really doesn't have any mutation in it. 2. When PCR primers fail to produce the expected product, a common (and often successful) strategy is to try again with a lower annealing temperature on the theory that the predicted Tm of the primers was a little bit too high. Are there any circumstances where it might help to try again with a higher annealing temperature rather than a lower one? [Hint: That the predicted annealing temperature is too low is not a good answer.] Sometimes the lack of the expected product means that the PCR primers are tied up in some nonproductive interaction that might be destabilized by use of a higher temperature. The most common of these is production of a primer dimer. Other nonproductive interactions include hairpin formation or self annealing between primers in a non priming configuration. Usually if the problem is just low stringency of priming, the correct product will appear accompanied by other spurious products. However, in cases where the desired product amplifies poorly because of GC content or large size, false priming can make small spurious products that use up the primers before the desired product has a chance to appear. Of course, if you go above the point where the primers can prime on the intended sites, then the reaction will still fail. But the predicted Tm may be conservative, so it's worth a try to both raise the temperature as well as lowering it to see what happens. Often people will raise and lower the Mg concentration instead of the temperature because that experiment can be done all at once. Of course, the best practice is to try to exclude these kinds of problems in the primer design phase. However, sometimes you are constrained by the nature of the sequence to use a suboptimal primer. These kinds of problems are most common when there is a large 5' extension on the primers, because of the increased number of possible interactions these sequences can get into. In these cases, raising the annealing temperature as high as it can go (to the extension temperature) may produce your product even if this temperature produces inefficient priming at the correct sites. This is because the inefficiency will only affect the first round. At later rounds, the entire length of the primer will be priming. 3. You conduct in vitro mutagenesis to change a protein residue encoded by an expression clone. Your supervisor asks you to provide some of the DNA to a summer student to try their hand at DNA sequencing. The summer student uses thermal cycle sequencing, fluorescent dye terminators, and the institutional sequencing facility includes his reactions in one of their capillary sequencing runs. The student reports back that your mutagenesis procedure has induced 4 single base deletions in the insert besides the intended mutation. Of course, you will obtain sequence data yourself for this clone, rather than relying on the work product of a summer student. However, in the spirit of assisting in the training of the summer student, are there any questions you should ask about their result. You should insist on seeing the chromatogram(s). With the chromatogram in sight, it is relatively simple to diagnose a variety of artifacts that cause bases to disappear. Of these, the most common are trying to read too far from the primer (loss of resolution), and uneven peak spacing from secondary structure (compression). 4. For the expression of mammalian proteins in E. coli, a fusion construct is usually used to enable affinity purification. The affinity domain can be either fused at the C terminus or the N terminus. Is there any advantage of putting the affinity domain on one end versus the other? Most circumstances favor N terminal fusions: When there are problems related to failure to fold leading to proteolysis, fusion to N terminal globular fusion partners is thought to be a more effective stabilizing force than fusion to C terminal fusion partners. (However, be aware that "instability" is the standard rationalization of people who don't actually know what's wrong with their construct). If the intent is to cause secretion of the protein, then the fusion partner with secretion signals must be on the N terminus. N terminal fusions avoid putting alien sequences next to the translation signals, which can inadvertently lead to mRNA secondary structure that inhibits translation. N terminal fusion keeps infrequently used codons in the alien portion of the construct away from the start of translation, where they are thought to have a greater inhibitory effect on translation efficiency. There are some sequences that are thought to trigger turnover of the protein when on the N terminus. N terminal fusions keep the alien sequence away from that position. If you need to retain the N terminal methionine on your purified protein, the N terminal fusion partner protects it from post translational removal. You will have to position the cleavage site so that the N terminal uncovered after in vitro cleavage is suitable for your purpose. Vectors for globular N terminal fusion partners have the valuable property that you can validate most important aspects of the expression system by expression from the vector without any alien insert. C terminal fusions can work fine if you don't happen to have any of the above circumstances. If you plan to do experiments with the fusion partner still attached, and find out that the structure or function of the protein will not tolerate an N terminal fusion, then a C terminal fusion might be the solution. You can imagine variations on this structural incompatibility theme that a C terminal fusion might solve. For example, if the N terminal structure tends to bury the protease cleavage site so that cleavage to remove the fusion partner is difficult, you might try a C terminal fusion. Some fusion partners have to be on the C terminus to properly function. For example, in some M13 phage display constructs the fusion partner needs its C terminus free to properly assemble into the phage. 5. You are asked to do an initial bioinformatics investigation of a newly discovered human gene that is very distantly related to another human gene your lab has studied for decades. Specifically, you have been asked to identify a mouse gene that could be used as an animal model of the human gene. You observe in one protein family database that the gene is part of a named but uncharacterized family. The family lists yet another human gene and sometimes two genes for other mammals including mice. A different protein family database also lists an uncharacterized family containing the target gene, but not a second human gene and only one mouse gene. What is the likely source of this discrepancy? How will this situation influence your decision about choosing a mouse gene as the animal model for the human gene? What is described is a set of (at least) 3 paralogous genes. They are 1) the gene your lab has studied, 2) the gene you were asked to investigate, and 3) the other homologue that showed up in database #1. Since paralogues can be expected to have some degree of functional distinction, you will want to choose the orthologous mouse gene as your prospective mouse model rather than the paralogous one. The lazy thing to do would be to assume that curators for database #2 had subdivided the single family from database #1 into 2 families, such that the single mouse gene that was grouped with your target human gene was its orthologue. The worse case scenario would be that database #2 had only one mouse gene and one human gene in its family because it was incomplete. Then you'd have a 50% chance of choosing a mouse gene with a different function to model your human gene. Of course, you could just search database #2 for the missing paralogues to confirm that there was in fact another family that included them. Even given that result, family databases use fairly low power and arbitrary methods to define families. To avoid getting drawn into a paralogous comparison, you should reevaluate the relationships among these sequences yourself. The most thorough thing to do would be to make a tree containing all of these genes, and do a bootstrap analysis. Many students simply said to consider the mouse gene that was "closest" to be the prospective mouse model. The algorithms that make trees are designed to be more accurate about measuring what is closest by descent than just looking at some similarity scores. The bootstrap analysis determines the statistical confidence that the perceived greater closeness of one of the mouse genes can not be accounted for by the random nature of the divergence process. Some students pointed out other issues that are worthy of mention. It is possible that database curator #1 created two entries out of one gene because there is alternative splicing. You should indeed track the protein sequence back to the nucleotide sequence to know if these are really different genes or not. As another issue, the protein databases actually organize families of domains, not of entire protein sequences. You will want to track down the nature of any other domains within each of these genes on the chance that the domain structure between paralogues is different. NAME: 6. Write a brief abstract (a few sentences) describing results from a fictitious research project. Use the following terms in the proper context in the abstract: recessive complementation phenotype selection allele diploid conditional lethal To isolate yeast mutants affected in DNA replication, we used mutagenesis and a selection scheme based on temperature-sensitive, i.e. conditional lethal, 3H-thymidine incorporation. Mutagenesis was conducted with MATa and MAT haploid cells, and 50 different mutants were found by phenotype analysis of diploids to fall into 5 complementation groups. All were recessive mutations. We have assigned preliminary genotype designations of rep1-rep5. One complementation group (rep2) was shown to represent 20 different mutant alleles. 7. Briefly describe the genetic selections used in the following applications: (a) the yeast two-hybrid system Co-transformants containing two plasmids (one containing the activation-domain fusion gene and the other containing the DNA-binding domain fusion gene) are selected based on nutritional markers on each plasmid (say LEU and TRP). Selection for interaction is based on activation of GAL4 promoters or UAS sequences connected to reporter genes (usually HIS and LACZ). (b) gene replacement or knockout in mammalian embryonic stem cells Homology boxes on either side of the changed sequence allow for homologous recombination. Positive selection (i.e., G418 resistance due to nearby NEO gene inside the homology boxes) combined with negative selection (i.,e., for resistance to ganciclovir due to loss of TK genes outside the homology boxes) is used to identify stem cells containing the replacement at the proper locus. 8. Consider the following situation. You mutagenize Drosophila larvae and then examine the wings of adults that hatch out from the pupal case. You notice that a group of sensory bristles that line the periphery of the wing are absent. Based upon what was discussed in class: a. Define cell autonomous and non-autonomous phenotypes. Based upon these definitions, state in which cells you would expect to find the mutations that give rise to both autonomous and non-autonomous phenotypes. Cell Autonomous: Phenotype resides with genotype. Cells carrying the mutations exhibit the phenotype. Mutations would reside in bristle cells themselves. Non-autonomous: Phenotype is independent of genotype. Cells carrying the mutation are not the one’s exhibiting the phenotype. Mutations would reside in cells other that the bristle cells. b. Why is it important to distinguish between such phenotypes? As phenotypes don’t necessarily correlate with genotypes, it is important to determine where the mutation resides. This will allow you to assess potential mechanisms of actions for specific mutations and help in determining how genes function in an in vivo setting. c. Finally, suggest potential mechanisms whereby autonomous and non-autonomous mutations might give rise to the observed phenotype. Cell autonomous mechanism: Mutation resides in the genes involved in formation of sensory bristles. Could range from master regulatory genes required for sensory cell maintenance to genes involved in bristle structure (i.e. structural genes). Non-autonomous mechanism: Mutation resides in cells neighboring the bristle. Mutations could affect secretion of a signal required for bristle cell formation. Also could play a role in lateral inhibition via competition mechanisms. 9. Briefly describe how DNA microarrays are used for expression profiling. What types of follow-up methods are needed to confirm results? DNA fragments (PCR products or oligonucleotides) representing a spectrum of genes (the number may be limited or inclusive for an organism) are displayed on glass slides or filters. The microarray is hybridized with cDNAs made from control RNA samples and labeled with one type of fluorophore, and with cDNAs made from test RNA samples and labeled with a different type of fluorophore. Levels of fluorescent intensity are compared to determine relative levels of expression of given genes in the control versus test RNAs. This method has many applications, including analysis of tissuespecific expression, disease versus non-disease states, mutant versus wild-type patterns of expression, etc. Results of microarray analyses should be confirmed by Northern or Western blots to confirm changes in expression of some of the gene products. 10. Researchers have found that knockout of the gene for a particular mouse protein results in embryonic lethality. They are convinced that the protein is particularly important in the kidney as part of the insulin secretory pathway. How could they apply Cre-lox methods to test this hypothesis? They would construct one mouse line (using stem cell positive/negative selection) containing the gene for the Cre recombinase under control of a kidneyspecific promoter (ideally one that can be turned on late in development). They would construct another mouse line containing the gene of interest flanked by loxP sites (i.e., a floxed gene). By mating they would make the second mouse line homozygous, then mate this homozygous line with the line carrying the Cre recombinase gene. This should provide specific knockout of the gene in kidney cells later in development to study effects on insulin secretion