Mutation Is Random Andrew P. Martin University of Colorado © Copyright 2014 Contents Practical Overview Pedagogical Overview Learning Goals Instructor Preparation Prior to Class Description of In-Class Activities Background Reading for the Students Quiz On Background Reading In Class Activities: Student Handouts Example Summative Assessment Questions 2 3-4 5 6 7-14 15-19 20-22 23-27 28-30 The occurrence of spontaneous mutations along the length of a gene. The small boxes upstream (to the left) are regulatory elements (that turn on and off gene expression). The rectangular grey boxes are exons. The solid black line is non-coding DNA. The distribution of mutations along the length of the gene is random. 1 Practical Overview Student preparation and homework prior to class 1) Students prepare for class by completing some reading about mutation 2) Students complete and submit an online quiz based on the reading and includes three questions that form the basis for the first part of the in-class lesson Instructor preparation prior to class 1) The instructor reviews the students' performance on the quiz and chooses to highlight one or two questions 2) The instructor constructs clicker questions for the last three questions of the quiz (the ones focused on mutation happens to individuals randomly). 3) Instructor make photocopies of student handouts (graph to draw number of mutations as a function of antibiotic concentration, the dystrophin gene to draw mutations and the corresponding graph for drawing the expected probability of mutation, graph axes for plotting prediction for why gene vary in mutation rate, and the table for calculating expected number of new alleles per generation for polar bears, humans, and gut bacteria). 4) Review the Powerpoint slides In-class Implementation There are four parts of the in-class lesson: mutation is random in a population, mutation is random in a gene and in the genome, why some genes mutate more than others, and how many new alleles are born each generation. Materials 1) Clicker station 2) Ability to construct "real time" clicker questions 3) Handouts for students (see In-Class Activities) 2 Pedagogical Overview In general, students do not understand random spontaneous mutation. Part of the explanation may be that they do not understand mutation and part of the explanation may be that they have a tenuous grasp on the meaning of random. If you ask them, for instance, what mutation is random means, they will generally answer the question incorrectly (Figure 1). Students think that random means events are rare, unpredictable, and that the underlying mechanisms are uncertain when, in fact, mutation is random simply refers to the fact that the next mutation can happen to any one of many individuals in a population at any site in the genome, and that mutation is independent of its effect. Figure 1. Question on a pre-assessment for students beginning the evolutionary biology course. The correct answer is D. Part of the problem with student's understanding of mutation is that most believe mutations happen for a reason and happen because they are needed. This is evident from queries of students (Figure 2). Remarkably, less than 10% of students chose the correct answer. Furthermore, the distribution of answers did not change AFTER students successfully completed the introductory general biology course (the normalized learning gain was less than 2%). Figure 2. Left. Pre-assessment question from the freshman-level general biology course. Correct answer is A. Right. Distribution of answers. From K. Davies and B. Melbourne. 3 This activity is designed to address the persistent misconception that mutations happen because they are needed and that they happen in the right place at the right time. Students grapple with what "mutation is random" means at the level of the individual in a population, at the level of a single nucleotide in a gene, and at the level of a whole chromosome or genome. In addition, the activity focuses some on what a probability, like 10-9 In this lesson, students use several learning tools: constructing graphs that either summarize data or show predictions and they solve quantitative problems. Most importantly, though, is that there are many opportunities for visualizing student thinking, for students to justify their thinking, and for students to adjust how they think about mutation relative to their peers and relative to summaries of data. In other words, students are provided with multiple opportunities to assess how well their mental models accurately reflect reality and to revise their models when there is mismatch between their model and reality. 4 Learning Goals Content Learning Goals 1) Explain the statement "mutation is random" at the level of the nucleotide, the gene, the chromosome, the genome, and the individual 2) Describe a uniform probability distribution 3) Estimate the number of new mutations that appear each generation in a population given information on the probability if mutation (mutation rate) and number of individuals 4) Explain why some genes are expected to mutate more frequently than other genes Process Learning Goals 1) Demonstrate science as a way of knowing Formulate hypotheses and derive or use models that make testable predictions Design experiments that test predictions of hypotheses or models Record observations about the world with minimal error and bias Correctly interpret information presented in graphs, tables, illustrations, and text-based descriptions and use the information to construct evidencebased claims Assess the uncertainty of claims Use quantitative and qualitative methods for analyzing data as a means to choose among competing claims, models, or hypotheses using objective and rationally defensible criteria 2) Communicate with purpose, clarity and brevity Articulate evidence-based claims Construct easy-to-interpret graphs, tables, illustrations, and text-based descriptions of information Use models to represent the world Assess the extent that your claims are influenced by cultural context (e.g. norms, society, emotions, etc) 3) Cooperate with individuals of varying background, abilities, and perspective for achieving a common goal Listen with purpose and intention Achieve consensus Negotiate with mutual interests in mind and without confrontation Leverage differences in expertise 5 Instructor Preparation Prior to Class Reviewing the quiz Students should have completed the on-line quiz prior to class. The instructor should review the distribution of answers and note whether there are questions in which fewer than 60% of students answered correctly. In addition, the instructor may want to highlight a few questions and show the connection between the question and the text from which the question was developed. It is worthwhile to show the histogram of scores for a question or two and use the opportunity to emphasize the importance of doing the quiz and being able to justify your answer. Summarizing the results from the "which individual mutates" questions Students were asked to choose one individual in a line-up of individuals that mutates. The question was implemented on-line (hopefully) and the results were summarized using a histogram. The instructor should construct a slide for each question that includes the original question and the resulting histogram for each question. The last question should be the bacteria one. Preparing Handout for Students The instructor should photocopy the handouts of the two in-class activities (Annotation and Predicting Evolution). Alternatively, if students construct products digitally, the instructor needs to have posted digital copies of the handouts. Review slides Finally, the instructor should review the prepared slides for the lesson (see accompanying powerpoint, MutationIsRandom.ppt). 6 Description of In-Class Activities Mutation is Random in a Population (6-9 minutes) The instructor should show the first "which individual mutates?" scenarios with the histogram of student responses (Figure 3). Here, there is no confounding information and it is likely students will have picked someone randomly. Ideally, the distribution of responses reflects a uniform distribution (each individual has the same number of votes). Histogram Figure 3. The query that asks students to choose which individual is most likely to experience a mutation. Students are asked this question as homework and the instructor should summarize student answers and insert the histogram below the figure. Prior to showing the second query (about mutation to an antibiotic resistance gene in bacteria) the instructor asks students to draw their predictions of the number of number mutations that happen depending on the concentration of antibiotics. Students are asked to draw a histogram for 100 mutations (Figure 4). 7 Figure 4. A graph that allows students to construct a general model showing the influence of the amount of antibiotics in the environment on the number of new mutations in a population. Once students have drawn their graph, the instructor can harvest some of them and construct a clicker question to have student choose the graph that best shows their graph, or alternatively, the instructor can use the prepared clicker question that includes two common misconceptions (C & D)(Figure 5). There is a template slide for constructing a real-time clicker question and also the prepared slide in the Powerpoint. 8 Figure 5. Clicker question to harvest student thinking and provide a context for student's to justify their thinking. After having discussing student graphs, the instructor shows the results from the second query (about bacteria)(Figure 6) and asks students to justify their answers. Ideally, one or more students claims "Because mutation is random, I didn't really know which individual to choose so I just picked one." Histogram Figure 6. Query that students completed as homework. The instructor summarizes student responses and inserts the histogram below the figure. Following construction of the graph and the summary clicker question, the instructor shows the results from the homework and asks students to justify their answer 9 or, alternatively, what the histogram should look like if mutation is random (and student chose bacteria that mutated randomly). Part 2. Mutation is random in a gene (10-12 minutes) This same idea for the individual in a population is done for sites within a gene. Rather than including an environmental gradient, students consider where a mutation might happen given that different motifs exists in different places in a gene. Ideally, students should put tick marks randomly on the gene. The instructor should demonstrate what putting ticks marks on a gene looks like. Students are asked to construct a graph of the probability of mutation by location in the gene (Figure 7). Figure 7. A query that asks students to draw lines (tick marks) on a gene to indicate where 15 mutations are most likely to happen. The instructor should harvest student pictures and construct a clicker question if there are different graphs, or use the example below that shows typical student responses (Figure 8). Importantly, this graph includes the probability of mutation on the y-axis. Once students have answers, the histogram should be shown and students asked to justify their answer, even if ALL students answer the uniform distribution. 10 Figure 8. Three probability distributions for mutations in a gene. The students repeat the same process with the whole chromosome (Figure 9). The goal of repetition is to consolidate the perspective that spontaneous mutation happens randomly (and with equal probability) across an entire chromosome, and by extension, the whole genome. As before, student work should be harvested, ideally by constructing another clicker question based on student work, and using the results of the clicker question as a context for having students justify their answers. Figure 9. Students asked to put mutations along the chromosome relative to the location of the dystrophin gene. Ideally, the spatial distribution should be random. Part 3. Why some genes have more mutations than others (15 minutes) One of the implications of mutation is random is that the probability of mutation is identical across all sites, and by extension, there is a single mutation rate per site throughout the genome. With this in mind, students are confronted with an observation. Different genes have different underlying mutation rates, and some genes, like 11 dystrophin, have a remarkably high mutation rate. Students can be given a moment to talk with their peers and then the instructor should use random call out to get several possible explanations. The instructor should write down student explanations (i.e. make their thinking visible). A good strategy is to construct a clicker question using three or four different responses, including "none of the explanations listed" and then query the whole class. The tendency is for students to revert back to the idea that there are differences in rate among genes because some genes are more important than others. Once students have a general idea, they should try to draw a graph that summarizes their perspective. For example, if their idea is that genes vary in importance, their graph should have importance on the x-axis and mutation rate on the y-axis. If their idea is that mutation rate is due to location in the genome, they should have a corresponding graph that shows this. The correct answer is simple: genes vary in length. The mutation is random model predicts that each nucleotide position has an identical probability of mutation. Given this, the mutation rate per gene is simply proportional to its length; therefore, the longer the gene, the more likely it is to gain a mutation simply because there are more opportunities, more nucleotide positions, for mutation to happen. Obviously, the best graph is one in which gene size (in numbers of base pairs) is on the x-axis and mutation rate per gene is on the y-axis and there should be a straight line with a positive slope. If you notice differences among individuals in their graphs, it is a good idea to summarize their graphs in a clicker question and ask them to choose their graph. Figure 10 shows the template for constructing a set of rival graphs. Students should be prepared to defend or justify their answer. Figure 10. Axes for constructing a clicker questions that allows making students' thinking visible. One of the graph (B) is already fleshed out with the correct answer. 12 The answer is that dystrophin gene is about 100 times longer than the average gene; hence, its mutation rate is 100 times higher than the average gene. At this point it is worth inserting a real world connection. The dystrophin gene has a number of different disease alleles that result in muscular dystrophy (Figure 11). Because there are more targets of mutation, there are more ways the gene can be messed up and cause muscular problems in individuals. Depending on the interest of the instructor, it is worth inserting a short video describing the phenotypic effects of dystrophin mutations. I provide one example in the Powerpoint (see https://www.youtube.com/watch?v=AF4D4TyE9NM). If you use this video, you can just watch part of it, no need to watch the whole thing, and turn off the sound because there is no narration. Figure 11. The location of described mutations in the dystrophin gene in humans. All of these mutations have a phenotypic effect (i.e. none of the mutations without an effect on fitness are shown). 13 Part 4. How many new alleles are born each generation? (12 minutes) Students are confronted with the probability of mutation per gene per generation -5 (10 -10-6). They are asked to estimate the number of new alleles that appear in the population each generation for three different species: polar bears, humans, and gut bacteria. The three scenarios differ dramatically in population size. The challenge is simple: multiply the probability of mutation by population size. The last challenge is tricky because it requires students estimate the total (global) population size of E. coli based on the mass of E. coli in a single human and the mass of a single E. coli cell. There are two clicker questions, one for polar bears and one for humans and each clicker question is followed by the work required to arrive at the correct answer (see Figure 12 for an example). If there is not sufficient time, these questions can be assigned as homework and the answers collected using an online assessment tool. For the last question on E. coli, the instructor should harvest four answers generated by different individuals or small groups and construct a clicker questions with the four answers and an answer that is "None of the above". This clicker question provides a context for students to justify (or explain) their answer. Figure 12. Powerpoint slide that shows how to calculate the expected number of new mutant alleles in each new generation. 14 Background Reading for the Students Types of Mutations A mutation is a change in the sequence of DNA in the genome of an organism. It can be a single base change—for example a G changing to an A—or mutation can involve a longer stretch of DNA, including the gain and loss of genes and other largescale changes. A glossary of different types of changes is provided in Table 1. Table 1. Examples of common types of mutations. The list is not exhaustive (there are other types of mutations). For the single base change and insertion or deletion, the mutation shown happens in a protein-coding gene with the DNA sequence organized into codons that specify a specific amino acid. For the frameshift mutation, notice that a single base deletion results in a change in the encoded protein and the formation of a stop codon (indicated by the ***). For the inversion and duplication the mutation involves genes aligned along a chromosome, with different genes shaded differently. In the former, the gene order changes, and in the latter there is a gain of one or more of the genes. Type Single base change Example Insertion or deletion Insertion resulting in frameshift Inversion Duplication General Mechanisms There are several mechanisms that cause mutation. One important source of mutation is error introduced when DNA is copied. Polymerases, the protein responsible for synthesizing DNA during cell replication, are not perfect. Even though there is proofreading and DNA repair proteins whose sole function is to patrol DNA and fix lesions and breaks, the molecular machinery responsible for maintenance and replication does not get it right every time DNA replicated. Error rates are on the order of 10-3-10-10 per base, depending on the organism and the type of polymerase. Because mutations are introduced when DNA is replicated, it should not come as a surprise that the more times DNA is replicated, the more mutations occur. Individuals with a greater number of germ line cell replications will contribute more genetic variants to the next generations than those with fewer replications. So, for instance, in mammals, males introduce more mutations into their offspring than females because male germ line cells replicate more 15 times prior to the formation of gametes than female germ cells (Figure 10). Across a number of different animals, males accumulate, on average, from 2 to 5 times more mutations than females, and the difference in mutation contribution of the two sexes increases with age. The difference can be predicted based on the difference of number of germ line cell divisions (Figure 10). Sex-specific differences in mutation rates highlights the importance of DNA replication errors as a mutational mechanism. Similarly, in a specific period of time, say a decade, organisms with many generations will have more mutations than those with fewer generations because replication of germ line cells is tied to reproduction. So, for example, mutations accumulate faster in mice than in elephants because the generation time of a mouse (the amount of time from birth until a mouse produces an offspring) is about 10 weeks; for an elephant generation time is considerably longer: on the order of 25 years. Number of germ line cell divisions 1000 Male Female 100 10 1 0 20 40 60 Age Figure 10. Number of germ line cell replications for female (above left) and male (above right) gametes (Crow, 2000). Note that all eggs are derived from 24 replication events, 22 of which occurred prior to birth. For males, replication of the germ line cells begins early in life and proceeds slowly until puberty (about 15 years) and then the stem cell lineage replicates once every 16 days for the remainder of life, resulting in an increasing difference in numbers of germ line replication being at puberty for the two sexes (immediate left). Another major source of mutations is inaccurate repair of DNA damage from metabolic processes within the cell. Free radicals are abundant by-products of cellular metabolism. Free radicals are not revolutionary political figures espousing freedom but highly reactive forms of oxygen (including superoxide, peroxide and hydroxyl) that have an unpaired electron that can damage DNA in a variety of ways. The observation that variation in nucleotide substitution rates among individuals can be explained, in part, by metabolic differences (or more generally by body size) suggests that endogenous DNA damage is a significant mutational mechanism generating genetic variation (Figure 11). An interesting take home message is that the faster you live (measured by the amount of oxygen consumed), the faster you mutate. Because evolution depends on variation, it is probably also true that the faster you live, the faster you evolve. 16 Figure 11. The rate of nucleotide substitution (% substitutions per site per million years) decreases with increasing body size: mutation accumulate faster in smaller than in larger animals (Gillooly et al., 2005). Another important source of mutation is recombination. Crossing over during meiosis results in gene sequences in offspring that differ from both parents. Technically this is not mutation because there is not a specific change in the sequence of the offspring relative to the parent sequences. Yet, the sequence in the offspring is unique: there is a section form mom followed by a section from dad. There are, however, cases in which recombination can lead to duplication or deletion mutations. In most cases, this happens as a consequence of unequal recombination resulting from misalignment of chromosomes during meiosis. The result is gametes with insertions or deletions relative to the parent (Figure 12). Figure 12. The consequences of misalignment and recombination during meiosis, resulting in a gamete that has an extra copy of a gene (a gene duplication) and a gamete that is missing a gene (a deletion). Mutation is Random Mutation happens, more or less, randomly across the genome such that every site has an identical probability of change. This of course is a gross generalization because different types of sequences have different mutation rates. Simple sequences, like a string of repeated AC dinucleotide pairs, have very high probabilities of mutation. These 17 exceptions notwithstanding, a reasonable assumption is that each location in the genome has the same probability of mutation. A site does not have a higher probability of change because it's important and integral to the function of the protein, or because it's unimportant because the change has no consequence for function. Mutation happens without respect to what is needed or the functional consequences. Spontaneous mutation is independent of its effect. In general, mutation is random means that the process across the genome is best described by a uniform probability distribution (Figure 13A). If you were to look at a gene after it had accumulated a bunch of mutations before selection, the distribution of mutations would be random across the gene (Figure 13B). A B Figure 13. A picture of a gene, with two regulatory elements (small blue squares), three exons (grey rectangles) and various non-coding sequences (thick black lines). A uniform probability distribution for mutation across a gene. All sites along the gene have the same probability of mutation. B. An example showing the distribution of mutations along the length of the gene assuming mutation is random. Number of New Mutations Each Generation Mutation is important because it generates novel variants on which selection can act. In the absence of mutation, there is no heritable variation and evolution is impossible. But how much variation is generated? Are there a large number of new alleles added to a population every generation? The probability of mutation per nucleotide in the genome is approximately 10-8, or 0.000000001. If we assume that the average length of a gene is on the order of 1,000 nucleotides (or base pairs), then the probability of a new allele being born by mutation is 1,000 multiplied by 10-8: that's 10-5, or 1 in a 100,000 genes. In each diploid individual there are two copies of a gene. So every generation 1 in 50,000 individuals has a new allele for a particular gene born by spontaneous mutation. In a population of a million individuals, there are 20 new alleles every generation; twenty different individuals in the population have new alleles. These numbers are similar to lottery numbers: it's very unlikely for an individual to have a mutation in a particular gene and extremely unlikely that two individuals gain the same mutation at the same site in the gene. What about across the whole genome? How many new mutations happen in an individual's genome? The expected number of mutations is simply the number of total base pairs (for humans that's 3.2 billion base pairs) multiplied by the probability of 18 mutation per nucleotide (10-8, or 1 in a hundred million). So, each new individual born has about 30 new mutations not present in either parent. This is an underestimate because it does not account for mutations other than single base changes. So, it is likely that for all types of mutations, each newborn has about 60 to 100 new mutations across 3.2 billion base pairs that differ from both parents. In terms of percent of the genome of a newborn that is different from its parents, that's about 0.000002%. This is a very small fraction of the genome and implies that the change in a genome by mutation is a very slow process. Like so many things in life, variation in a population on which selection can act is a numbers game. In small populations, for example, populations with only 100 individuals, it is unlikely that a particular gene, for example a gene involved in controlling the flow of potassium and sodium across the membrane of a cell, will have a mutation. The chance of a mutation in the gene each generation is about 0.002 (1 in 50,000 multiplied by 100 individuals). Another way to think about the problem is that 500 generations elapse, on average, between successive mutations. In large populations, however, mutation, even though it is a rare event, can generate large numbers of new alleles. Consider, for example, a typical population of bacteria in your gut. There is somewhere in the neighborhood of 1 trillion bacteria in a gram of poop. How many new alleles happen for a given gene each generation in a population of 1 trillion cells? Assuming the rate is 10-5 per gene per generation, if there are 1 trillion cells, then there are 107 new alleles born by spontaneous mutation each generation: that's 10 million new alleles each generation. In contrast to the small population, 10 million new alleles is an enormous amount of variation. What even more remarkable is that there will be a mutation at every single base pair in the genome in such a large population. It's likely that every possible mutation that can happen actually happens in extremely large populations. Vocabulary DNA Genome Mutation Polymerase DNA proofreading DNA replication Germ line cells Free radical Point mutation Gene duplication Diploid Unequal recombination Probability Uniform probability distribution Random 19 Quiz On Background Reading Q1: Two important mechanisms underlying point mutations include... A) ...polymerase error during replication and incorrect repair of DNA damage B) ...metabolism and DNA replication C) ...differences between the two sexes and hyper-reactive molecules D) ...DNA replication and DNA damage Q2: Rank mouse, human and elephant according to the accumulation of mutations per generation as a consequence of DNA damage mutation pathways (with 1 = fewest mutations and 3 = most mutations). A) 1 = mouse, 2 = human, 3 = elephant B) 1 = elephant, 2 = human, 3 = mouse C) 1 = human, 2 = mouse, 3 = elephant D) 1 = elephant, 2 = mouse, 3 = human E) All three have the same expected accumulation of mutations Q3: Rank mouse, human and elephant according to the accumulation of mutations per generation as a consequence of errors in DNA replication (with 1 = fewest mutations and 3 = most mutations). F) 1 = mouse, 2 = human, 3 = elephant G) 1 = elephant, 2 = human, 3 = mouse H) 1 = human, 2 = mouse, 3 = elephant I) 1 = elephant, 2 = mouse, 3 = human J) All three have the same expected accumulation of mutatios Q4: How do gene duplications happen? A) DNA damage induced mutation B) Polymerase infidelity during replication C) Unequal recombination D) All of the above can result in gene duplication Q5: In humans, at what age do males and females have the same number of germ line cell divisions? A) At birth B) Approximately at the age where individuals become reproductively mature (between about age 13 and 18). C) When individuals are at the peak of their reproductive potential (about age 25) D) After individuals are finished with reproduction (about age 40) 20 Q6: Assuming the probability of mutation is 10-8 per site, how many new mutations happen if the genome is 5 billion base pairs? A) 5 B) 50 C) 500 D) 5000 E) 50000 Q7: Choose the graph the best represents a uniform probability distribution. Assume probability is on the y-axis. Q8: If the probability of mutation per site is 10-8, which of the two stretches of DNA is more likely to have more spontaneous mutations: a protein-coding gene A that is 2000 base pairs long or a non-coding stretch of "junk" DNA that is 2000 base pairs long? A) The protein-coding gene because it the mutations can have effects that helps the individual B) The non-coding "junk" DNA because the mutations have no effect on the individual C) Both stretches of DNA are expected to have the same number of mutations because they are the same length D) Impossible to know given the information because mutation is random Q9: If a single base pair deletion happens in a section of gene that encodes protein, which one of the following is an effect of the mutation? A) Mismatch between the DNA sequence and the encoded amino acids B) The appearance of a stop codon in the gene C) A change in the encoded amino acids because the mutation causes a frameshift D) The new allele will be deleterious E) All of the above Q10: Which of the following influences the probability of mutation at a particular site? A) The functional consequences of the mutation B) Whether the mutation is in a protein coding gene 21 C) Whether the mutation is in a portion of the chromosome that controls whether another gene is turned on or off (i.e. whether the mutation happens in a regulatory gene) D) All of the above E) None of the above Q11: Imagine a single individual will gain a mutation. Which individual is most likely to gain the mutation? Q12: Imagine there is a population of bacteria that live in an environment with a concentration gradient of antibiotics. Choose the bacterium that is most likely going to have a mutation in a gene that causes antibiotic resistance? 22 In-Class Activities: Student Handouts Mutation Is Random In A Population Imagine there is a population of bacteria living in an environment with a gradient of antibiotic, from localities where antibiotics are absent to localities where antibiotics are abundant. Using the graph provided, construct a histogram summarizing the nuimber of individuals with a spontaneous mutation in an antibiotic resistance gene. 23 Mutation Is Random in a Gene and the Genome. Imagine you are interested in a particular gene. In this case, we'll focus on the dystrophin gene. The dystrophin gene plays an important role in the structure and function of muscles. Imagine there have been fifteen spontaneous mutations in the dystrophin gene across hundreds of individuals in a population. Each mutation happened independently (i.e. 15 individuals gained a unique mutation in the gene). Your job is to summarize where the mutations occurred in the gene for the 15 mutations. Mark on the gene where each of the mutations happened (put a tick mark on the gene below for each one of the 15 mutations). Figure 2. The dystrophin gene. Each colored box with a letter or numbered label is a particular motif. The sections labeled ABD1, CR, and CT are extremely important for the function of the gene. The motifs labeled R are repetitive. The white boxes are spliced out of the gene during RNA processing in some cells. Use the information from your mapping of mutations along the gene to construct a graph showing the probability of mutation along the length of the gene. 24 Finally, construct a graph of the probability of mutation along the length of the X chromosome on which the human dystrophin gene is located. Figure 4. The X chromosome. The dystrophin gene is indicated by the red arrow. Be prepared to justify why you drew the graph you did. 25 Why Some Genes Have More Mutations Than Others Use the information about the probability of mutation along the length of a gene or chromosome to answer the following question. The mutation rate of the dystrophin gene is about 100 times greater than the mutation rate for the average gene. The high mutation rate of this gene mostly explains why the most common genetic disease in humans (and probably other organisms with muscle) is due to mutations in this gene. Muscular dystrophy is caused by mutations in the dystrophin gene. The frequency of muscular dystrophy in human males is about 1 in 5,000. Why is the mutation rate for this gene so high? Based on your reasoning, draw a graph that summarizes your thinking (Hint: draw a graph that provides a general explanation for variation in the mutation rate of genes in the genome). Make sure you label the axes. 26 How many new alleles are born each generation? Below is a list of different species and their estimated global population sizes. If the mutation rate per gene is 10-5-10-6 per generation, how many new alleles are born each generation for each of the species for a particular gene (like hemoglobin)? Show your work. Species Estimated population size Polar bears (Ursus arctos) 20,000 Humans (Homo sapiens) 7,000,000,000 E. coli (gut bacteria) in the total human population In a single human there is about 10 grams of E. coli. Each E. coli weighs about 1×10−12 g. 27 New alleles per gene in the population each generation Example Summative Assessments Q1: Q2: 28 Q3: If there is a population of 1 trillion (1,000,000,000,000,000) cells (this is typical for a population of bacteria in a human gut) and the mutation rate per nucleotide in the genome is 10-8 per generation, what is the frequency of a new mutant allele defined by a change at a single base pair (assume if a mutation happens at the same site in two different cells the result is the same allele even though the alleles originated independently)? A) One thousand divided by a trillion B) Ten thousand divided by a trillion C) One hundred thousand divide by a trillion D) One million divided by a trillion E) Cannot calculate the expected frequency of a new allele Q4: The figure above shows the structure of hemoglobin (the protein involved with oxygen delivery in the blood vascular system). Hemoglobin consists of two different proteins (alpha and beta) and in each hemoglobin molecule there are two alpha and two beta proteins. Surveys of hemoglobin in natural populations have revealed that the amino acids in the interaction zone between the alpha and beta proteins are never variable, and different species typically have the same amino acids in the alpha and beta proteins where the two proteins interact. The lack of variation in the interaction zone part of the encoded protein reflects... A) ...a low probability of spontaneous mutation to the interaction regions of the alpha and beta globin genes B) ...the fact that mutation rates are highly variable across the length of a gene depending on the type of mutation that is allowed to occur C) ...mutations that happened in the gene regions encoding interaction zone amino acids resulted in low survivorship of individuals and were therefore eliminated from the population so that they are not detected D) ...the very low mutation rate for the two hemoglobin genes because hemoglobin is so essential for normal, every day life function E) ...the fact that mutations happen elsewhere in the gene where function is improved depending on the environment 29 3-D structure of hemoglobin showing the location of 5 amino acids that allow for adaptation to low concentrations of oxygen characteristic of high elevation environments (Storz et al. 2007. PLOS Genetics). Q5: The figure above shows a 3-D picture of hemoglobin and five amino acid positions in which specific changes allow organisms to thrive in high elevation environments that have low concentrations of oxygen relative to lower elevations. Imagine a population of mice colonized a high elevation environment but they do not have any of the changes indicated. If a mutation happened, it would most likely happen... A) ... at one of the positions that allow mice to adapt to the low oxygen concentration at high elevation B) ... only at a site that increases the ability of hemoglobin to bind oxygen C) ... at some other site in the gene not associated with adaptation to low concentration of oxygen D) There is not sufficient information to confidently choose among the alternative above. 30