UNIT-2 GENETICS OF PROKARYOTES AND EUKARYOTIC ORGANELLES AND GENE STRUCTURE & EXPRESSION Structure 2.0 Introduction 2.1 Objectives 2.2 Genetics of Prokaryotes and Eukaryotic Organelles: 2.2.1 Phage Phenotypes 2.2.2 Mapping the Bacteriophage Genome 2.2.3 Recombination in Phage 2.2.4 Genetic Transformation, Conjugation and Transduction in Bacteria 2.2.5 Genetics of Mitochondria and Chloroplasts 2.2.6 Cytoplasmic Male Sterility 2.3 Gene Structure and Expression: 2.3.1 Genetic Fine Structure: Fine Structure of Gene 2.3.2 Cis-Trans Test 2.3.3 The Structure Analysis of Eukaryotes Introns and their Significance 2.3.4 RNA Splicing 2.3.5 Regulation of Gene Expression in Prokaryotes and Eukaryotes 2.4 Let Us Sum Up 2.5 Check Your Progress 2.6 Check Your Progress: The Key 2.7 Assignment 2.8 References 2.0 INTRODUCTION In this world variety of organisms are present from several centuries. They continuous interact with the environment and perform all the necessary activities require for the life. Reproduction is a characteristic living activity found among all the living entities either prokaryotes or eukaryotes. In this chapter we will try to understand the genetics of lower unicellular and higher multicellular organisms. 2.1 OBJECTIVE This unit is aim to explain the gene related behaviour of viruses (phages), bacteria together with the eukaryotes. One can be learn following facts after thorough movement over to this unit: Life Cycle, General & New Phenotype, recombination and genome mapping technique for Bacteriophages, [1] Commonly adapted methods of reproduction in bacteria, Extrachromosomal genetics of eukaryotic organelles like mitochondria and chloroplast including the role of cytoplasm in some features of heredity, Physical status of gene and its single or unit behaviour (cis-trans test), Non-coding region(Intron) of mRNA but having importance, Processing of RNA by splicing mechanism and Genes having different units or factors for controlling its working. 2.2 GENETICS OF PROKARYOTES AND EUKARYOTIC ORGANELLES 2.2.1 Phage Phenotypes Bacteriophage - A bacteriophage (from 'bacteria' and Greek phagein, 'to eat') is any one of a number of viruses that infect bacteria. The term is commonly used in its shortened form, phage. Typically, bacteriophages consist of an outer protein hull enclosing genetic material. The genetic material can be ssRNA (single stranded RNA), dsRNA, ssDNA, or dsDNA between 5 and 500 kilo base pairs long with either circular or linear arrangement. Bacteriophages are much smaller than the bacteria they destroy usually between 20 and 200 nm in size. T2 and its close relative T4 are viruses that infect the bacterium E. coli. The infection ends with destruction (lysis) of the bacterial cell so these viruses are examples of bacteriophages Figure 2.1: Phenotype of a Bacteriophage ("bacteria eaters"). General Phenotype - Generally each virus particle (virion) consists of: a protein head (~0.1 µm) inside of which is a single, circular molecule of doublestranded DNA containing 166,000 base pairs. (Figure 2.1) a protein tail from which extend thin protein fibers Life Cycle - The virus attaches to the E. coli cell. This requires a precise molecular interaction between the fibers and the cell wall of the host. The DNA molecule is injected [2] into the cell. Within 1 minute, the viral DNA begins to be transcribed and translated into some of the viral proteins, and synthesis of host proteins is stopped. At 5 minutes, viral enzymes needed for synthesis of new viral DNA molecules are produced. At 8 minutes, some 40 different structural proteins for the viral head and tail are synthesized. At 13 minutes, assembly of new viral particles begins. At 25 minutes, the viral lysozyme destroys the bacterial cell wall and the viruses burst out — ready to infect new hosts. o If the bacterial cells are growing in liquid culture, it turns clear. o If the bacterial cells are growing in a "lawn" on the surface of an agar plate, then holes, called plaques (Figure 2.2), appear in the lawn. New Phenotypes - Occasionally, new phenotypes appear such as a change in the appearance of the plaques or even a loss in the ability to infect the host. Examples: h o Some strains of E. coli, e.g. one designated B/2, gain the ability to resist infection by normal ("wild-type") T2. The mutation has caused a change in the structure of their cell wall so that the tail fibers of T2 can no longer bind to it. However, T2 can strike back. Occasional T2 mutants appear that overcome this resistance. The mutated gene, designated h (for "host range"), encodes a change in the tail fibers so they can once again bind to the cell wall of strain B/2. The normal of "wild-type" gene is designated h+ . o When plated on a lawn containing both E. coli B and E. coli B/2, the mutant (h) viruses can lyze both strains of E. coli, producing clear plaques, while the wild-type (h+) viruses can only lyze E. coli B producing mottled or turbid plaques. r o Occasional T2 mutants appear that break out of their host cell earlier than normal. o The mutation occurs in a gene designated r (for "rapid lysis"). It reveals itself by the extra-large plaques that it forms. o The wild-type gene, producing a normal time of lysis, is designated r+. It forms normal-size plaques. As with so many organisms, the occurrence of mutations provides the tools to learn about such things as The function of the gene; Its location in the DNA molecule (mapping). 2.2.2 Mapping the Bacteriophage Genome A bacteriophage (from 'bacteria' and Greek ‘phagein= to eat') is any one of a number of viruses that infect bacteria. The term is commonly used in its shortened form, phage. Typically, bacteriophages consist of an outer protein hull enclosing genetic material. The genetic material can be ssRNA (single stranded RNA), dsRNA, ssDNA, or dsDNA between 5 and 500 kilo base pairs long with either circular or linear arrangement. [3] Bacteriophages are much smaller than the bacteria they destroy - usually between 20 and 200 nm in size. T2 and its close relative T4 are viruses that infect the bacterium E. coli. The infection ends with destruction (lysis) of the bacterial cell so these viruses are examples of bacteriophages ("bacteria eaters") Bacteriophage genome can be mapped by following method. Figure 2.2: Plaques are clear patches of lysed cells on a lawn of bacteria. (E.C.S. Chan/Visuals Unlimited) Techniques for the Study of Bacteriophage’s genome Viruses reproduce only within host cells; so bacteriophages must be cultured in bacterial cells. To do so, phages and bacteria are mixed together and plated on solid medium in a Petri plate. A high concentration of bacteria is used so that the colonies grow into one another and produce a continuous layer of bacteria, or “lawn,” on the agar. An individual phage infects a single bacterial cell and goes through its lytic cycle. Many new phages are released from the lysed cell and infect additional cells; the cycle is then repeated. The bacteria grow on solid medium; so the diffusion of the phages is restricted and only nearby cells are infected. After several rounds of phage reproduction, a clear patch of lysed cells (a plaque) appears on the plate (Figure 2.2). Figure 2.3: Hershey and Rotman developed a technique for mapping viral genes. (Photo from G.S. Stent, Molecular Biology of Bacterial Viruses. Copyright © 1963 by W.H. Freeman and Company.) [4] Each plaque represents a single phage that multiplied and lysed many cells. Plating a known volume of a dilute solution of phages on a bacterial lawn and counting the number of plaques that appear can be used to determine the original concentration of phage in the solution. (1) Mapping by Recombination Frequencies - The strain B of E. coli can be infected by both h+ and h strains of T2 bacteriophage. In fact, a single bacterial cell can be infected simultaneously by both. Let us infect a liquid culture of E. coli B with two different mutant T2 viruses h r+ and h+ r (Figure 2.4) When this is done in liquid culture, and then plated on a mixed lawn of E. coli B and B/2, four different kinds of plaques appear. The most Figure 2.4: Recombination in Phage abundant (460 each) are those representing the parental types; that is, the phenotypes are those expected from the two infecting strains. However, small numbers (40 each) of two new phenotypes appear. These can be Number of Plaques 460 explained by genetic recombination having occasionally occurred between the DNA of each parental type within the bacterial cell. + Just as in higher organisms, one assumes that hr clear, small the frequency of recombinants is proportional to + hr turbid, large 460 the distance between the gene loci. In this case, h+r+ turbid, small 40 80 out of 1000 plaques were recombinant, so the distance between the h and r loci is assigned a hr clear, large 40 value of 8 map units or centimorgans. Total = 1000 Now repeat coinfecting E. coli B with two other strains of T2: hm+ and hm+ 470 h+ m + hm 470 Again, 4 kinds of plaques are produced: + + parental (470 each) and recombinant hm 30 (30 each). hm 30 The smaller number of recombinants Total = 1000 indicates that these two gene loci (h and m) are closer together (6 cM) than h and r (8 cM). But the order of the three loci could be either Genotype Phenotype [5] m–6–h—8—r or mr+ 440 h–6–m-2-r To find out which is the m+r 440 correct order, perform a m+r+ 60 third mating using mr+ and Mr 60 m+r Total = 1000 This makes it clear that the order is m—h—r, not h—m—r. But why only 12cM between the outside loci (m and r) instead of the 14cM produced by adding the map distances found in the first two matings? (2) Mapping by A Three-Point Cross - The answer comes from performing a mating between T2 viruses differing at all three loci: hmr and h+m+r+ (Note: this time one parent has all mutant; the other all wild-type alleles — don't be confused!) Group 1 hmr 435 Group 2 + hmr 435 Group 3 h+mr+ 25 + + + Group 4 hm r 25 Group 5 + hmr 35 Group 6 h+m+r 35 Group 7 + + Recombination by Three Point cross The result: 8 different types of plaques are formed. + parentals; that is, nonrecombinants in Groups 1 Group 8 h mr 5 and 2; Total = 1000 recombinants - all the others Analyzing these data shows how the two-point cross between m and r understated the true distance between them. Let's first look at single pairs of recombinants as we did before (thus ignoring the third locus). If we look at all the recombinants between h and r but ignore m (as in the first experiment), we find that they are contained in Groups 5, 6, 7, and 8 -7 giving the total of 80 that we found originally. If we look at recombinants between h and m but ignore r (as in the second experiment), we find that they are contained in Groups 3, 4,7, and 8 - giving the same total of 60 that we found before. But if we focus only on m and r (as we did in the third experiment), we find that the recombinants are contained in Groups 3, 4, 5, and 6 - giving the same total of 120 as before while the non-recombinants are not only in Groups 1 and 2 but also hm r 5 [6] in Groups 7 and 8. The reason: a double-crossover occurred in these cases, restoring the parental configuration of the m and r alleles. Because these double crossovers were hidden in the third experiment, the map distance (12 cM) was understated. To get the true map distance, we add their number to each of the other recombinant groups (Groups 3,4,5, and 6) so 25 + 5 +25 +5 +35 + 5 + 35 + 5 = 140, and the true map distance between m and r is the 14 cM that we found by adding the map distances between h and r (8 cM) and h and m (6 cM). The three-point cross is also useful because it gives the gene order simply by inspection: Find the rarest genotypes (here Groups 7 and 8), and the gene NOT in the parental configuration (here h) is always the middle one. 2.2.3 Genetic Recombination in Phage Site-specific genetic recombination is very common method in phage for exchanging the genetic material. Unlike general recombination it is guided by a recombination enzyme that recognizes specific nucleotide sequences present on one or both of the recombining DNA molecules. Base-pairing between the recombining DNA molecules need not be involved, and even when it is, the heteroduplex joint that is formed is only a few base pairs long. By separating and joining double-stranded DNA molecules at specific sites, this type of recombination enables various types of mobile DNA sequences to move about within and between chromosomes. Site-specific recombination was first discovered as the means by which a bacterial virus, bacteriophage lambda, moves its genome into and out of the E. coli chromosome. In its integrated state the virus is hidden in the bacterial chromosome and replicated as part of the host's DNA (Figure 2.8). When the virus enters a cell, a virus-encoded enzyme called lambda integrase is synthesized. This enzyme catalyzes a recombination process that begins when several molecules of the integrase protein bind tightly to a specific DNA sequence on the circular bacteriophage chromosome. The resulting DNA-protein complex can now bind to a related but different specific DNA sequence on the bacterial chromosome, bringing the bacterial and bacteriophage chromosomes close together. The integrase then catalyzes the required DNA cutting and resealing reactions, using a short region of sequence homology to form a tiny heteroduplex joint at the point of union (Figure 2.5). The integrase resembles a DNA topoisomerase in that it forms a reversible covalent linkage to DNA wherever it breaks a DNA chain. The same type of site-specific recombination mechanism can also be carried out in reverse by the lambda bacteriophage, enabling it to exit from its integration site in the E. coli chromosome in order to multiply rapidly within the bacterial cell. This excision reaction is catalyzed by a complex of the integrase enzyme (Figure 2.6) with a second bacteriophage protein, which is produced by the virus only when its host cell is stressed. If the sites recognized by such a recombination enzyme are flipped, the DNA between them will be inverted rather than excised (Figure 2.7). Many other enzymes that catalyze site-specific recombination resemble lambda integrase in requiring a short region of identical DNA sequence on the two regions of DNA helix to be joined. [7] Because of this requirement, each enzyme in this class is fastidious with respect to the DNA sequences that it recombines, and it can be expected to catalyze one particular DNA joining event that is useful to the virus, plasmid, transposable element, or cell that contains it. These enzymes can be exploited as tools in transgenic animals to study the influence of specific genes on cell behavior. Figure 2.5: The formation of a cross-strand exchange. There are many possible pathways that can lead from a single-strand exchange to a cross-strand exchange, but only one is shown. Figure 2.6: The insertion of bacteriophage lambda DNA into the bacterial chromosome. In this example of site-specific recombination, the lambda integrase enzyme binds to a specific "attachment site" DNA sequence on each chromosome, where it makes cuts that bracket a short homologous DNA sequence; the integrase thereby switches the partner strands and reseals them so as to form a heteroduplex joint 7 base pairs long. Each of the four strand-breaking and strand joining reactions required resembles that made by a DNA topoisomerase, inasmuch as the energy of a cleaved phosphodiester bond is stored in a transient covalent linkage between the DNA and the enzyme. [8] Figure 2.7: Switching gene expression by DNA inversion in bacteria. Alternating transcription of two flagellin genes in a Salmonella bacterium is caused by a simple site-specific recombination event that inverts a small DNA segment containing a promoter that in one orientation (A) activates transcription of the H2 flagellin gene as well as a repressor protein that blocks the expression of the H1 flagellin gene. When the promoter is inverted, it no longer turns on H2 or the repressor, and the H1 gene, which is thereby released from repression, is expressed instead (B). The recombination mechanism is activated only rarely (about once every 105 cell divisions). Therefore, the production of one or other flagellin tends to be faithfully inherited in each clone of cells. Site-specific recombination enzymes that break and rejoin two DNA double helices at specific sequences on each DNA molecule often do so in a reversible way: as for lambda bacteriophage, the same enzyme system that joins two DNA molecules can take them apart again, precisely restoring the sequences of the two original DNA molecules. This type of recombination is therefore called conservative site-specific recombination to distinguish it from the mechanistically. [9] Figure 2.8: The life cycle of bacteriophage lambda. The lambda genome contains about 50,000 nucleotide pairs and encodes about 50 proteins. Its double-stranded DNA can exist in either linear or circular forms. As shown, the bacteriophage can multiply by either a lytic or a lysogenic pathway in the E. coli bacterium. When the bacteriophage is growing in the lysogenic state, damage to the cell causes the integrated viral DNA (provirus) to exit from the host chromosome and shift to lytic growth. The entrance and exit of the DNA from the chromosome are site-specific genetic recombination events catalyzed by the lambda integrase protein. 2.2.4 Genetic Transformation, Conjugation and Transduction in Bacteria Bacteria can exchange or transfer DNA between other bacteria in three different ways. In every case the source cells of the DNA are called the DONORS and the cells that receive the DNA are called the RECIPIENTS. In each case the donor DNA is incorporated into [10] the recipients cell's DNA by recombination exchange (Figure 2.9). If the exchange involves an allele of the recipient's gene, the recipient's genome and phenotype will have changed. The three forms of bacterial DNA exchange are (1) TRANSFORMATION, (2) CONJUGATION and (3) TRANSDUCTION. Figure 2.9: General scheme of bacterial exchange of DNA. DNA from a donor cell is transferred to a recipient cell where it undergoes recombinational exchange, replacing one or more of the recipient's genes with those from the donor. Figure 2.10: Representative FERTILITY PLASMID. A fertility plasmid carries the genes for conjugation as well as a number of other genes. In this figure the fertility plasmid also carries antibiotic resistant genes. [11] Plasmids - Before DNA This would be like someone afraid of being robbed carrying around an AK-47, a rocket launcher and a small cannon; they might be safe from thieves, but all that bulk and weight is going to seriously interfere with their everyday lives--like getting dates. exchange can be discussed it is necessary to understand what PLASMIDS are? Plasmids are best thought of as MINI-CHROMOSOMES. Plasmids are composed of DNA which usually exists as a CIRCULAR MOLECULE, only much SMALLER than the genomic DNA (Figure 2.10). Plasmids vary in size, but most are between 1,000 to 25,000 base pairs vs. 4,000,000 bp in the genome. Plasmids REPLICATE AUTONOMOUSLY from the genomic chromosome. Often there are MANY Figure 2.11: Plasmids in a PLASMID COPIES present in one cell (Figure 2.11). bacterial host cell. A cell may Further, a cell may contain SEVERAL DIFFERENT contain no plasmids, one PLASMIDS or it may contain NO PLASMIDS at all. plasmid or many copies of a Plasmids generally carry genes that are NOT plasmid. A single host may ESSENTIAL for a cell's survival except under special contain a number of different plasmids (green, blue & pink). circumstances. For example, many plasmids carry genes for ANTIBIOTIC RESISTANCE (Figure 2.13). When these plasmids are present in a cell, it is unaffected by the appropriate antibiotic, but if the plasmid and its antibiotic resistant gene is lost, the host cell becomes sensitive to a given antibiotic. Some plasmids carry resistance genes to several antibiotics, making them very dangerous pathogens. In other cases plasmids, called VIRULENCE-PLASMIDS, carry VIRULENCE GENES that enhance a host's ability to cause a disease. That is, a bacterium carrying a plasmid containing the virulence gene is able to CAUSE A DISEASE (Figure 2.12), but when the plasmid is missing that same bacterium is unable to produce that disease. One such plasmid-based disease of recent concern is the strain of E. coli - O157:H7 that produces a severe food-borne disease. Other plasmids carry genes for protecting a cell against DELETERIOUS substances like mercury, copper or they may carry genes that make it possible for a cell to metabolize an UNUSUAL SUBSTRATE, such as gasoline, as a nutrient or energy source. The question naturally arises as Figure 2.13: Selection of to the PURPOSE of these Antibiotic-Resistant plasmids in the evolutionary Mutants. If an antibioticsensitive bacterium is scheme. The current grown in a culture, explanation is that occasionally a random plasmids constitute an mutation occurs that EXTRA POOL OF renders a bacterium GENE resistant to a given antibiotic. To detect the ALLELES presence of such a and thus mutation one plates the enlarge the culture on a medium effective containing a lethal dose of gene the antibiotic in question. Any cells that grow on the pool left plate must be resistant (red colonies) to the antibiotic. All the cells (green & for red)plaque will grow on Figure 2.12: Preparation of phage dilution a medium lacking the formation antibiotic. [12] of the population. Remember that the genome of prokaryotes carries only enough information for between 1,000 to 5,000 genes. But, as we've already learned, the more variety the better a species' chances of survival are in a fickle universe. The phenomenon of ANTIBIOTIC RESISTANCE is a case in point. Antibiotics, being natural products of certain organisms, are never-the-less unlikely to be encountered very often in quantities that endanger susceptible sensitive strains, so there is no need to carry resistance genes against the hundreds of antibiotics that lurk in the nooks 'n crannies of the environment. Indeed, to do so would likely tie up all your genes just for this one purpose; clearly not a survival plus. However, random mutation has produced antibiotic resistance genes that clearly can prove useful under the RIGHT CIRCUMSTANCES, but how do they remain available, without tying up huge quantities of LIMITED RESOURCES? The answer is PLASMIDS, of course (bet you saw that coming didn't you?). A RARE PLASMID, randomly carrying a RARE ANTIBIOTIC RESISTANT GENE to, for example, penicillin, happens to be in a patient suffering from an infection (e. g. - clap) which is treated by a shot in the you-bloody-well-know-where. All the resistant bacteria's mates, lacking the resistance plasmid, are quickly killed, but the lucky bacterium with its penicillin-resistant-plasmid survives and reproduces while swimming in a sea of penicillin. Naturally, all the subsequent daughter cells carry the resistance plasmid, because if they didn't they'd die very quickly. This is a classical example of SURVIVAL OF THE FITTEST & of evolution in action. In the modern world we produce huge quantities of antibiotics, so the selective pressure on bacteria containing plasmids carrying antibiotic resistant genes is intense, particularly in places like hospitals. As a consequence of this evolutionary process, current antibiotics are losing their effectiveness. To compound the problem, most of the plasmids carrying the antibiotic resistant genes have the ability to move from one bacteria to another by conjugation. In effect, a single cell carrying an antibiotic- resistant plasmid can "INFECT" many other cells with this plasmid thereby spreading the resistance plasmid rapidly THROUGHOUT a bacterial population (sort of like us getting a flu shot). Figure 2.14: Isolation of CELL-FREE The survival logic of this ability is obvious, at or NAKED DNA. The cells are broken and the DNA released. The cell-free least as far as the bacteria are concerned. DNA is subsequently isolated and collected. [13] Plasmids have one other very significant role to play in this story. They serve as the VEHICLES for carrying genes between cells in the genetic engineering revolution. Transformation - The discovery of transformation was previous described. Since its initial discovery transformation has been shown to occur throughout the bacterial world and it has become the most commonly used artificial way of moving genes from one Figure 2.15: Mixing of Donor DNA with bacterium to another. The basic Recipient Competent Cells. The naked donor procedure involves: DNA is incubated with the competent Breaking open the donor cells and recipient cells to which it binds. removing DNA from them so as to obtain a CELL-FREE, usually purified, form of DNA (NAKED DNA) (Figure 2.14). Transformation is used to move DNA between bacteria, plants and animals. In each case the methods used to get the DNA into the recipient cells are slightly different. In bacteria COMPETENCY (Figure 2.15) is an empirical matter; that is it can not be predicted what conditions will produce competency in a given strain of bacteria. However, the following treatment often induces competency in G- bacteria: Young cells are incubated with a CALCIUM CHLORIDE SOLUTION for approximately 30 min on ice. In some cases magnesium is also present. The cells are concentrated and suspended as a thick suspension in the calcium solution. The cells may be mixed with reagents like glycerol and stored at -80 oC for later use or they may be used immediately. Cell-free DNA is then mixed with these competent cells (Figure 2.16) on ice for approximately 30 min followed by a brief mild heating. The transformed cells are incubated in a rich medium for approximately 1 to 1.5 hr. Figure 2.16: Uptake and Recombination of Donor and then plated on medium containing DNA. Donor DNA binds to competent recipient materials that will detect the presence of the cells, following which it enters the recipient cells. transformed genes. Portions of the donor DNA align, at random, with A variety of other transformation techniques genes on the recipient DNA and segments of the are used for eukaryotic cells. These include two DNA's are exchanged. The exchange inserts Donor genes into the recipient cell"s DNA. mixing certain salts with DNA. These salts bind the DNA and the salt-DNA-complex is [14] then taken into the eukaryotic cells where the DNA is subsequently incorporated into the recipient cell's DNA. Plant cells are often covered with a thick cell wall that is difficult to penetrate. To get DNA into these cells tiny metal beads coated with the donor DNA are "shot" into the cytoplasm of the recipient cells using a "gas gun". A strong jolt of electricity is also used to drive the DNA into recipient cells. Because of the similar chemical nature of DNA, DNA from any living form can, in theory, function in any other life form. Animals or plants that have been transformed with DNA from other species are called TRANSGENIC organisms. For example, we have transgenic pigs and cows containing functional "human genes". Transgenic plants containing "bacterial genes" that make a protein toxic to certain insect pathogens are currently growing around the world. 2.2.5 Genetics of Mitochondria and Chloroplasts Mendel’s principles of segregation and independent assortment are based on the assumption that genes are located on chromosomes in the nucleus of the cell. For the majority of genetic characteristics, this assumption is valid, and Mendel’s principles allow us to predict the types of offspring that will be produced in a genetic cross. However, not all the genetic material of a cell is found in the nucleus; some characteristics are encoded by genes located in the cytoplasm. These characteristics exhibit cytoplasmic inheritance. A few organelles, notably chloroplasts and mitochondria, contain DNA. Each human mitochondrion contains about 15,000 nucleotides of DNA, encoding 37 genes. Compared with that of nuclear DNA, which contains some 3 billion nucleotides encoding perhaps 35,000 genes, the amount of mitochondrial DNA (mtDNA) is very small; nevertheless, mitochondrial and chloroplast genes encode some important characteristics. Cytoplasmic inheritance differs from the inheritance of characteristics encoded by nuclear genes in several important respects. A zygote inherits nuclear genes from both parents, but typically all of its cytoplasmic organelles, and thus all its cytoplasmic genes, come from only one of the gametes, usually the egg. Sperm generally contributes only a set of nuclear genes from the male parent. In a few organisms, cytoplasmic genes are inherited from the male parent, or from both parents; however, for most organisms, all the cytoplasm is inherited from the egg. In this case, cytoplasmically inherited maits are present in both males and females and are passed from mother to offspring, never from father to offspring. Reciprocal crosses, therefore, give different results when cytoplasmic genes encode a trait. Cytoplasmically inherited characteristics frequently exhibit extensive phenotypic variation, because there is no mechanism analogous to mitosis or meiosis to ensure that cytoplasmic genes are evenly distributed in cell division. Thus, different cells and individuals will contain various proportions of cytoplasmic genes. Consider mitochondrial genes. There are thousands of mitochondria in each cell, and each mitochondrion contains from 2 to 10 copies of mtDNA. Suppose that half of the mitochondria in a cell contain a normal wild-type copy of mtDNA and the other half contain a mutated copy (Figure 2.17). In cell division, the mitochondria segregate into progeny cells at random. Just by chance, one cell may receive mostly mutated mtDNA and another cell may receive mostly wild-type mtDNA (see Figure 2.17). In this way, different progeny from the same mother and even cells within an individual offspring [15] may vary in their phenotype e.g. cytoplasmic inheritance like inheritance of plastids in Mirabilis jalapa. Traits encoded by chloroplast DNA (cpDNA) are similarly variable. In 1909, cytoplasmic inheritance was recognized by Carl Correns as one of the first exceptions to Mendel’s principles. Correns, one of the biologists who rediscovered Mendel’s work, studied the inheritance of leaf variegation in the four-o’clock plant, Mirabilis jalapa. Correns found that the leaves and shoots of one variety of four-o’clock were variegated, displaying a mixture of green and white splotches. He also noted that some branches of the variegated strain had all-green leaves; other branches had all white leaves. Each branch produced flowers; so Correns was able to cross flowers from variegated, green, and white branches in all combinations (Figure 2.18). The seeds from green branches always gave rise to green progeny, no matter whether the pollen was from a green, white, or variegated branch. Similarly, flowers on white branches always produced white progeny. Flowers on the variegated branches gave rise to green, white, and variegated progeny, in no particular ratio. Figure 2.17: Cytoplasmically inherited characteristics frequently exhibit extensive phenotypic variation because cells and individual offspring contain various proportions of cytoplasmic genes. Mitochondria that have wild-type mtDNA are shown in red; those having mutant mtDNA are shown in blue. [16] Corren’s crosses demonstrated cytoplasmic inheritance of variegation in the fouro’clocks. The phenotypes of the offspring were determined entirely by the maternal parent, never by the paternal parent (the source of the pollen). Furthermore, the production of all three phenotypes by flowers on variegated branches is consistent with the occurrence of cytoplasmic inheritance. Variegation in these plants is caused by a defective gene in the cpDNA, which results in a failure to produce the green pigment chlorophyll. Cells from green branches contain normal chloroplasts only, cells from white branches contain abnormal chloroplasts only, and cells from variegated branches contain a mixture of normal and abnormal chloroplasts. In the flowers from variegated branches, the random segregation of chloroplasts in the course of oogenesis produces some egg cells with normal cpDNA, which develop into green progeny; other egg cells with only abnormal cpDNA develop into white progeny; and, finally, still other egg cells with a mixture of normal and abnormal cpDNA develop into variegated progeny. In recent years, a number of human diseases (mostly rare) that exhibit cytoplasmic inheritance have been identified. These disorders arise from mutations in mtDNA, most of which occur in genes coding for components of the electron-transport chain, which generates most of the ATP (adenosine triphosphate) in aerobic cellular respiration. One such disease is Leber hereditary optic neuropathy. Figure 2.18: Crosses for leaf type in four Patients who have this disorder o’clocks illustrate cytoplasmic inheritance. experience rapid loss of vision in both eyes, resulting from the death of cells in the optic nerve. Loss of vision typically occurs in early adulthood (usually between the ages of 20 and 24), but it can occur any time after adolescence. There is much clinical variability in the severity of the disease, even within the same family. [17] Leber hereditary optic neuropathy exhibits maternal inheritance: the trait is always passed from mother to child. 2.2.6 Cytoplasmic Male Sterility Background - The first documentation of male sterility came in 1763 when Joseph Gottlieb Kölreuter observed anther abortion within species and specific hybrids. It is more prevalent than female sterility, either because the male sporophyte and gametophyte are less protected from the environment than the ovule and embryo sac, or because it results from natural selection on mitochondrial genes which are maternally inherited and are thus not concerned with pollen production. Male sterility is easy to detect because a large number of pollen are produced and are easily studied. Male sterility is assayed through staining techniques (carmine, lactophenol or iodine); while detection of female sterility is detectable by the absence of seeds. Male sterility has propagation potential in nature since it can still set seed and is important for crop breeding, while female sterility does not. Male sterility can be aroused spontaneously via mutations in nuclear and/or cytoplasmic genes. Among the two types of male sterility, genetic and cytoplasmic, cytoplasmic male sterility (CMS) is caused by the extranuclear genome (mitochondria or chloroplast) and show maternal inheritance. Manifestation of male sterility in these may be either entirely controlled by cytoplamsic factors or by the interaction between cytoplamsic and nuclear factors. Explanation of Cytoplasmic male sterility - Cytoplasmic male sterility, as the name indicates, is under extra nuclear genetic control. They show non-Mendelian inheritance and are under the regulation of cytoplasmic factors. In this type, male sterility inherited maternally. This is not a very common type of male sterile system in the plant kingdom. In general there are two types of cytoplasm viz.., N (normal) and the aberrant S (sterile) cytoplasms. These types exhibit reciprocal differences. Cytoplasmic genetic male sterility - When nuclear genes for fertility restoration (Rf) are available for CMS system in any crop, it is called as Cytoplasmic Genetic Male Sterility (CGMS). This type of male sterility system is common in many plant species across plant kingdom. The sterility is manifested by the influence of both nuclear and cytoplasmic genes. There are commonly two types of cytoplasms, N (normal) and S (sterile). There are also restorers of fertility (Rf) genes, which are distinct from genetic male sterility genes. The Rf genes do not have any expression of their own unless the sterile cytoplasm is present. Rf genes are required to restore fertility in S cytoplasm which causes sterility. Thus a combination of N cytoplasm with rfrf and S cytoplasm with Rf- produces fertiles; while S cytoplasm with rfrf produces only male steriles. Another feature of these systems is that Rf mutations (i.e., mutations to rf or no fertility restoration) are frequent, so N cytoplasm with Rfrf is best for stable fertility. Because of the convenience to control the sterility expression by manipulating the gene– cytoplasm combinations in any selected genotype, cytoplasmic genetic male sterility systems are widely exploited in crop plants for hybrid breeding. Incorporation of these [18] systems for male sterility evades the need for emasculation in cross-pollinated species, thus encouraging cross breeding producing only hybrid seeds under natural conditions. Cytoplasmic male sterility in hybrid breeding - Male sterile plants produce no functional pollen, but do produce viable eggs. Cytoplasmic male sterility is used in agriculture to facilitate the production of hybrid seed. Hybrid seed is produced from a cross between two genetically different lines; such seeds usually result in larger, more vigorous plants. The main practical problem in producing hybrid seed is to prevent self-pollination, which would produce seeds that are not hybrid. One breeding scheme is illustrated in Figure 2.19. Hybrid production requires a female plant in which no viable male gametes are borne. Emasculation is done to make a plant devoid of pollen so that it is made female. Another simple way to establish a female line for hybrid seed production is to identify or create a line that is unable to produce viable pollen. This male sterile line is therefore unable to self-pollinate, and seed formation is dependent upon pollen from the male line. Cytoplasmic male sterility is used in hybrid seed production. In this case, the sterility is transmitted only through the female and all progeny will be sterile. This is not a problem for crops such as onions or carrots where the commodity harvested from the F1 generation is produced during vegetative growth. These CMS lines must be maintained by repeated crossing to a sister line (known as the maintainer line) that is genetically identical except that it possesses normal cytoplasm and is therefore male fertile. In genic cytoplasmic male sterility restoration of fertility is done using restorer lines carrying nuclear restorer genes in crops. The male sterile line is maintained by crossing with a maintainer line which has the same genome as that of the MS line but carrying normal fertile cytoplasm. [19] Figure 2.19: The use of cytoplasmic male sterility to facilitate the production of hybrid corn. In this scheme, the hybrid corn is generated from four pure parental lines: A, B, C, and D. Such hybrids are called double-cross hybrids. At each step, appropriate combinations of cytoplasmic genes and nuclear restorer genes ensure that the female parents will not self and that male parents will have fertile pollen. (After J. Janick et al., Plant Science. Copyright © 1974 by W. H. Freeman and Company.) [20] 2.3 GENE STRUCTURE AND EXPRESSION 2.3.1 Genetic Fine Structure or Fine Structure of Gene A gene (Figure 2.20) is a locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions and/or other functional sequence regions. The physical development and phenotype of organisms can be thought of as a product of genes interacting with each other and with the environment. A concise definition of a gene, taking into account complex patterns of regulation and transcription, genic conservation and nonFigure 2.20: This stylistic diagram shows a gene in relation coding RNA genes, has been to the double helix structure of DNA and to a chromosome (right). Introns are regions often found in eukaryote genes proposed by Gerstein et al. "A that are removed in the splicing process (after the DNA is gene is a union of genomic transcribed into RNA): only the exons encode the protein. sequences encoding a coherent This diagram labels a region of only 40 or so bases as a set of potentially overlapping gene. In reality most genes are hundreds of times larger. functional products". Colloquially, the term gene is often used to refer to an inheritable trait which is usually accompanied by a phenotype as in ("tall genes" or "bad genes") -- the proper scientific term for this is allele. In cells, genes consist of a long strand of DNA that contains a promoter, which controls the activity of a gene, and coding and non-coding sequence. Coding sequence determines what the gene produces, while non-coding sequence can regulate the conditions of gene expression. When a gene is active, the coding and non-coding sequence is copied in a process called transcription, producing an RNA copy of the gene's information. This RNA can then direct the synthesis of proteins via the genetic code. But some RNAs are used directly, for example as part of the ribosome. These molecules resulting from gene expression, whether RNA or protein, are known as gene products. Genes often contain regions that do not encode products, but regulate gene expression. The genes of eukaryotic organisms can contain regions called introns that are removed from the messenger RNA in a process called splicing. The regions encoding gene products are called exons. In eukaryotes, a single gene can encode multiple proteins, which are produced through the creation of different arrangements of exons through alternative splicing. In prokaryotes (bacteria and archaea), introns are less common and genes often contain a single uninterrupted stretch of DNA, called a cistron, that codes for a product. Prokaryotic genes are often arranged in groups called operons with promoter [21] and operator sequences that regulate transcription of a single long RNA. This RNA contains multiple coding sequences. Each coding sequence is preceded by a ShineDalgarno sequence that ribosomes recognize. The total set of genes in an organism is known as its genome. An organism's genome size is generally lower in prokaryotes, both in number of base pairs and number of genes, than even single-celled eukaryotes. However, there is no clear relationship between genome sizes and complexity in eukaryotic organisms. One of the largest known genomes belongs to the single-celled amoeba Amoeba dubia, with over 670 billion base pairs, some 200 times larger than the human genome. The estimated number of genes in the human genome has been repeatedly revised downward since the completion of the Human Genome Project; current estimates place the human genome at just under 3 billion base pairs and about 20,000–25,000 genes. A recent Science article gives a number of 20,488 protein-coding genes, with perhaps 100 more yet to be discovered. The gene density of a genome is a measure of the number of genes per million base pairs (called a Megabase, Mb); prokaryotic genomes have much higher gene densities than eukaryotes. The gene density of the human genome is roughly 12–15 genes per megabase pair. History The existence of genes was first suggested by Gregor Mendel (18221884), who, in the 1860s, studied inheritance in pea plants. Mendel's concept was given a name by Hugo de Vries in 1889, who, at that time probably unaware of Mendel's work, in his book Intracellular Pangenesis coined the term "pangen" for "the smallest particle [representing] one hereditary characteristic". Wilhelm Johannsen abbreviated this term to "gene" ("gen" in Danish and German) two decades later. Physical Structure - The vast majority of living organisms encode their genes in long strands of DNA. DNA consists of a chain made from four types of Figure: 2.21: The chemical structure of a nucleotide subunits: adenine, cytosine, guanine, and four-base fragment of a DNA double helix. thymine (Figure 2.21). Each nucleotide subunit consists of three components: a phosphate group, a deoxyribose sugar ring, and a nucleobase. Thus, nucleotides in DNA or RNA are typically called 'bases'; consequently they are commonly referred to simply by their purine or pyrimidine original base components adenine, cytosine, guanine, thymine. Adenine and guanine are purines and cytosine and thymine are pyrimidines. The most common form of DNA in a cell is in a double helix structure, in which two individual DNA strands twist around each other in a right-handed spiral. In this structure, the base pairing rules specify that guanine pairs with cytosine and adenine pairs with thymine (each pair contains one purine and one pyrimidine). The base pairing between guanine [22] and cytosine forms three hydrogen bonds, while the base pairing between adenine and thymine forms two hydrogen bonds. The two strands in a double helix must therefore be complementary, that is, their bases must align such that the adenines of one strand are paired with the thymines of the other strand, and so on. Due to the chemical composition of the pentose residues of the bases, DNA strands have directionality. One end of a DNA polymer contains an exposed hydroxyl group on the deoxyribose, this is known as the 3' end of the molecule. The other end contains an exposed phosphate group, this is the 5' end. The directionality of DNA is vitally important to many cellular processes, since double helices are necessarily directional (a strand running 5'-3' pairs with a complementary strand running 3'-5') and processes such as DNA replication occur in only one direction. All nucleic acid synthesis in a cell occurs in the 5'-3' direction, because new monomers are added via a dehydration reaction that uses the exposed 3' hydroxyl as a nucleophile. The expression of genes encoded in DNA begins by transcribing the gene into RNA, a second type of nucleic acid that is very similar to DNA, but whose monomers contain the sugar ribose rather than deoxyribose. RNA also contains the base uracil in place of thymine. RNA molecules are less stable than DNA and are typically single-stranded. Genes that encode proteins are composed of a series of three-nucleotide sequences called codons, which serve as the "words" in the genetic "language". The genetic code specifies the correspondence during protein translation between codons and amino acids. The genetic code is nearly the same for all known organisms. RNA genes - In some cases, RNA is an intermediate product in the process of manufacturing proteins from genes. However, for other gene sequences, the RNA molecules are the actual functional products. For example, RNAs known as ribozymes are capable of enzymatic function, and miRNAs have a regulatory role. The DNA sequences from which such RNAs are transcribed are known as RNA genes. Some viruses store their entire genomes in the form of RNA, and contain no DNA at all. Because they use RNA to store genes, their cellular hosts may synthesize their proteins as soon as they are infected and without the delay in waiting for transcription. On the other hand, RNA retroviruses, such as HIV, require the reverse transcription of their genome from RNA into DNA before their proteins can be synthesized. In 2006, French researchers came across a puzzling example of RNA-mediated inheritance in mouse. Mice with a loss-of-function mutation in the gene Kit have white tails. Offspring of these mutants can have white tails despite having only normal Kit genes. The research team traced this effect back to mutated Kit RNA. While RNA is common as genetic storage material in viruses, in mammals in particular RNA inheritance has been observed very rarely. Functional structure of a gene - All genes have regulatory regions in addition to regions that explicitly code for a protein or RNA product. A regulatory region shared by almost all genes is known as the promoter (Figure 2.22), which provides a position that is recognized by the transcription machinery when a gene is about to be transcribed and expressed. [23] Although promoter regions have a consensus sequence that is the most common sequence at this position, some genes have "strong" promoters that bind the transcription machinery well, and others have "weak" promoters that bind poorly. These weak promoters usually permit a lower rate of transcription than the strong promoteFigure 2.22: Diagram of the "typical" eukaryotic protein-coding gene. rs, because the Promoters and enhancers determine what portions of the DNA will be transcription machtranscribed into the precursor mRNA (pre-mRNA). The pre-mRNA is then inery binds to them spliced into messenger RNA (mRNA) which is later translated into protein. and initiates transcripttion less frequently. Other possible regulatory regions include enhancers, which can compensate for a weak promoter. Most regulatory regions are "upstream" - that is, before or toward the 5' end of the transcription initiation site. Eukaryotic promoter regions are much more complex and difficult to identify than prokaryotic promoters. Many prokaryotic genes are organized into operons, or groups of genes whose products have related functions and which are transcribed as a unit. By contrast, eukaryotic genes are transcribed only one at a time, but may include long stretches of DNA called introns which are transcribed but never translated into protein (they are spliced out before translation). Splicing can also occur in prokaryotic genes, but is less common than in eukaryotes. 2.3.2 Cis-Trans Test: Complementation This was studied in Bacteriophage Genetics, where mutation of a gene designated r, for "rapid lysis was examined." It turned out that actually there are three different gene loci rI, rII, and rIII - mutations in any one of which produced a rapid-lysis phenotype. But, in addition, there were many mutations found in each of these. Could wild-type virus be formed by recombination between mutations within the same gene? Seymour Benzer decided to find out. In Bacteriophage Genetics, the recombination frequency between different genes is low (on the order of 10-2). One would expect that recombination frequencies between mutations in a single gene would be far lower (10-4 or less). [24] Figure 2.23: Strain B infected by 2 different phage (rx & ry) and inoculated on lawn B (permissive) and lawn K (non-permissive) separately. Lawn B grown entirely i.e. “Total” but Lawn K grown limited i.e. “Restrictive”. Fortunately Benzer could exploit a phenomenon to enable him to detect such rare events: rII mutants can infect - but not complete their life cycle in - a strain of E. coli designated K. Wild-type T4 can complete its life cycle in both strains. The procedure was to infect strain B (Figure 2.23) in liquid culture with two mutants to be tested (designated here as rx and ry). After incubation, these were plated on a lawn of: strain B — which supports the growth of all viruses thus giving the total number of viruses liberated. strain K — on which only wild-type viruses can grow (Figure 2.24). The recombination frequency between any pair of mutations is calculated as Recombination Frequency = 2 × number of wild-type plaques (strain K plaques) ÷ total number of plaques (on strain B). You have to double the number found on strain K because you only see one-half the recombinants — the other half consists of double mutants. Using this technique, Benzer eventually found some 2000 different mutations in the rII gene. The recombination frequency between some pairs of these was as low as 0.02. The T4 genome has 160,000 base pairs of DNA extending over ~1,600 centimorgans (cM). So 1 cM = 100 base pairs So 0.02 cM represents a pair of adjacent nucleotides. From these data, Benzer concluded that the o Smallest unit of mutation and o The smallest unit of recombination was a single base pair of DNA. In other words, These mutations represent a change in a single base pair - we call these point mutations. Recombination between two molecules of DNA can occur at any pair of nucleotides. As we saw above, rapid lysis (r) mutants were found that mapped to three different regions of the T4 genome: rI, rII, and rIII. This meant that Those in different regions were not alleles of the same gene. More than one gene product participated in the lysis function. Even within one "locus", rII, there turned out to be two different stretches of DNA both of which were needed intact for the lysis function. This was revealed by the complementation test that Benzer used. In this test, E. coli strain K (which rII mutants can infect but not complete their life cycle) growing in liquid culture - was Co-infected with two different rII mutants (here shown as "1" and "2"). [25] Note that this procedure differs from the earlier one (recombination) in that the nonpermissive E. coli K is used for the initial infection (not strain B as before). Neither strain rII"1" nor strain rII"2" is able to grown in E. coli K. But if the lost function in rII"1" is NOT the same as the lost function in rII"2", then each should be able to produce the gene product missing in the other complementation - and living phages will be produced. (Again, there is no need to count plaques; simply see if they are formed or not.) Mutant strains 1 2 3 4 5 1 2 3 4 5 0 0 + 0 + 0 + 0 + 0 + 0 0 + Figure 2.24: Strain K (Nonpermissive) infected by phage 1 & 2 inoculated on the lawn of strain B (permissive). 0 From these results, you can deduce that these 5 rII mutants fall into two different complementation groups, which Benzer designated A (containing strains 1, 2, and 4) and B (containing strains 3 and 5) Later work showed that the function of rII depended on the polypeptide products encoded by two adjacent regions (A and B) of rII (perhaps acting as a heterodimer). In terms of function, then, both A and B qualify as independent genes. In co-infections by two mutant strains, If either A or B is mutated on the same DNA molecule ("cis"), there is no function while If A is mutated in one DNA molecule and B in the other ("trans"), function is restored. Complementation, then, is the ability of two different mutations to restore wild-type function when They are in the "trans" (on different DNA molecules) But not when they are in "cis" (on the same DNA molecule). Benzer coined the term cistron for these genetic units of function. But today, we simply modify earlier concepts of the "gene" to fit this operational definition. 2.3.3 The Structure Analysis of Eukaryotes Introns and their Significance Introns, derived from the term "intragenic regions", are non-coding sections of precursor mRNA (pre-mRNA) or other RNAs, that are removed (spliced out of the RNA) before the mature RNA is formed. Once the introns have been spliced out of a pre-mRNA, the resulting mRNA sequence, composed of exons, is ready to be translated into a protein. The corresponding parts of a gene are known as introns as well. [26] Introduction - Introns are common in eukaryotic pre-mRNA, but in prokaryotes they are only found in tRNA and rRNA. Introns, which are non-coding sections of a gene that are removed, are the opposite of exons which remain in the mRNA sequence after processing. The number and length of introns varies widely among species, and among genes within the same species. Genes of higher organisms, such as mammals and flowering plants, have numerous introns, which can be much longer than the nearby exons. Some less advanced organisms, such as fungus Saccharomyces cerevisiae, and protists, have very few introns. In humans, the gene with the greatest number of introns is the gene for the protein Titin, with 362 introns. Figure 2.23 is showing simple illustration of a pre-mRNA, with introns (top), after the introns have been removed via splicing, the mature mRNA sequence is ready for translation (bottom). Introns sometimes allow for alternative splicing of a gene, so that several different proteins which share some sequences in common can be translated from a single gene. The control of mRNA splicing is performed by a wide variety of signaling molecules. Introns may also contain "old code", or sections of a gene that were once translated into a protein, but have since been discarded. It was generally assumed that the sequence of any given intron is junk DNA with no function. More recently, however, this is being disputed. Introns contain several short sequences that are important for efficient splicing. The exact mechanism for these intronic splicing enhancers is not well understood, but it is thought that they serve as binding sites on the transcript for proteins which stabilize the spliceosome. It is also possible that RNA secondary structure formed by intronic sequences may have an effect on splicing. Discovery - The discovery of introns led to the Nobel Prize in Physiology or Medicine in 1993 for Phillip Allen Sharp and Richard J. Roberts. The term intron was introduced by American biochemist Walter Gilbert: "The notion of the cistron [...] must be replaced by that of a transcription unit containing regions which will be lost from the mature messenger - which I suggest we call introns (for intragenic regions) - alternating with regions which will be expressed - exons." (Gilbert 1978) Classification of Introns - Some introns, such as Group I and Group II introns, are actually ribozymes that are capable of catalyzing their own splicing out of a primary RNA transcript. This self splicing activity was discovered by Thomas Cech, who shared the 1989 Nobel Prize in Chemistry with Sidney Altman for the discovery of the catalytic properties of RNA. Four classes of introns are known to exist: Group I intron Group II intron Group III intron Nuclear introns Sometimes group III introns are also identified as group II introns, because of their similarity in structure and function. [27] Nuclear or spliceosomal introns are spliced by the spliceosome and a series of snRNAs (small nuclear RNAs). There are certain splice signals (or consensus sequences) which abet the splicing (or identification) of these introns by the spliceosome. Group I, II and III introns are self splicing introns and are relatively rare compared to spliceosomal introns. Group II and III introns are similar and have a conserved secondary structure. The lariat pathway is used in their splicing. They perform functions similar to the spliceosome and may be evolutionarily related to it. Group I introns are the only class of introns whose splicing requires a free guanine nucleoside. They possess a secondary structure different from that of group II and III introns. Many self-splicing introns code for maturases that help with the splicing process, generally only the splicing of the intron that encodes it. Intron evolution - There are two competing theories that offer alternative scenarios for the origin and early evolution of spliceosomal introns (Other classes of introns such as self-splicing and tRNA introns are not subject to much debate, but see for the former). These are popularly called as the Introns-Early (IE) or the Introns-Late (IL) views. The IE model, championed by Walter Gilbert, proposes that introns are extremely old and numerously present in the earliest ancestors of prokaryotes and eukaryotes (the progenote). In this model introns were subsequently lost from prokaryotic organisms, allowing them to attain growth efficiency. A central prediction of this theory is that the early introns were mediators that facilitated the recombination of exons that represented the protein domains. Such a model would directly lead to the evolution of new genes. Unfortunately, the model cannot account for the variations in the positions of shared introns between different species. The IL model proposes that introns were more recently inserted into original intron-less contiguous genes after the divergence of eukaryotes and prokaryotes. In this model, introns probably had their origin in parasitic transposable elements. This model is based on the observation that the spliceosomal introns are restricted to eukaryotes alone. However, there is considerable debate on the presence of introns in the early prokaryoteeukaryote ancestors and the subsequent intron loss-gain during eukaryotic evolution. It is also suggested that the evolution of introns and more generally the intron-exon structure is largely independent of the coding-sequence evolution. Identification Nearly all eukaryotic nuclear introns begin with the nucleotide sequence GU, and end with AG (the GU-AG rule). These, along with a larger consensus sequence, help direct the splicing machinery to the proper intronic donor and acceptor sites. This mainly occurs in eukaryotic primary mRNA transcripts. 2.3.4 RNA Splicing The other major type of modification that takes place in eukaryotic pre-mRNA is the removal of introns by RNA splicing. This occurs Conclusion: Critical consensus sequences are present at the 5’ splice site, the branch point, and the 3’ splice site. Figure 2.25: Splicing of pre-mRNA requires consensus sequences. In the consensus sequence surrounding the branch point (YNYYRAY) Y is any pyrimidine, R is any purine, A is adenine, and N is any base. [28] in the nucleus following transcription but before the RNA moves to the cytoplasm. Consensus sequences and the spliceosome Splicing requires the presence of three sequences in the intron. One end of the intron is referred to as the 5' splice site, and the other end is the 3' splice site (Figure 2.25); these splice sites possess short consensus sequences. Most introns in pre-mRNA begin with GU and end with AG, suggesting that these sequences play a crucial role in splicing. Changing a single nucleotide at either of these sites does indeed prevent splicing. A few introns in pre-mRNA begin with AU and end with AC. These introns are spliced by a process that is similar to that seen in GU. . . AG introns but utilizes a different set of splicing factors. This discussion will focus on splicing of the more common GU. . . AG introns. The third sequence Table 2.1: RNA–RNA interactions in pre-mRNA splicing important for splicing is at the so-called branch Interaction Function point, which is an adenine U1 with 5' splice U1 attaches to 5' end of intron; nucleotide that lies from site commits intron to splicing; no direct 18 to 40 nucleotides role in splicing upstream of the 3' splice U2 with branch Positions 5' end of intron near branch site (Figure 2.25). The point point for lariat formation sequence surrounding the U2 with U6 Holds 5' end of intron near branch branch point does not point have a strong consensus U6 with 5' splice Positions 5' end of intron near branch but usually takes the form site point YNYYRAY (Y is any U5 with 3' end of Anchors first exon to spliceosome pyrimidine, N is any base, first exon subsequent to cleavage; juxtaposes R is any purine, and A is two ends of exon for splicing adenine). The deletion or U5 with 3' end of Juxtaposes two ends of exon for mutation of the adenine one exon and 5' splicing nucleotide at the branch end of the other point prevents splicing. U4 with U6 Delivers U6 to intron; no direct role in Splicing takes place splicing within a large complex called the spliceosome, which consists of several RNA molecules and many proteins. The RNA components are small nuclear RNAs; these snRNAs associate with proteins to form small ribonucleoprotein particles. Each snRNP contains a single snRNA molecule and multiple proteins. The spliceosome is composed of five snRNPs, named for the snRNAs that they contain (U1, U2, U4, U5, and U6), and some proteins not associated with an snRNA. The process of splicing To illustrate the process of RNA splicing, we’ll first consider the chemical reactions that take place. Then we’ll see how these splicing reactions constitute a set of coordinated processes within the context of the spliceosome. Before splicing takes place, an upstream exon (exon 1) and a downstream exon (exon 2) are separated by an intron (Figure 2.26). Pre-mRNA is spliced in two distinct steps. In the first step, the pre-mRNA is cut at the 5' splice site. This cut frees exon 1 from the intron, and the 5' end of the intron attaches to the branch point; that is, the intron folds back on itself, forming a structure called a lariat. The guanine nucleotide in the consensus sequence at the 5' splice site, bonds with the adenine nucleotide, at the branch point. This bonding is accomplished through transesterification, a chemical reaction in which the OH [29] Figure 2.26: The splicing of nuclear introns requires a two-step process. First, cleavage takes place at the 5' splice site, and a lariat is formed by the attachment of the 5' end of the intron to the branch point. Second cleavage takes place at the 3' splice site, and two exons are spliced together. group on the 2'-carbon atom of the adenine nucleotide at the branch point attacks the 5' phosphodiester bond of the guanine nucleotide at the 5' splice site, cleaving it and forming a new 5'–2' phosphodiester bond between the guanine and adenine nucleotides. In the second step of RNA splicing, a cut is made at the 3' splice site and, simultaneously, the 3' end of exon 1 becomes covalently attached (spliced) to the 5' end of exon 2. This bond also forms through a transesterification reaction, in which the 3'-OH group attached to the end of exon 1 attacks the phosphodiester bond at the 3' splice site, cleaving it and forming a new phosphodiester bond between the 3' end of exon 1 and the 5' end of exon 2; the intron is released as a lariat. The intron becomes linear when the bond breaks at the branch point and is then rapidly degraded by nuclear enzymes. The mature mRNA consisting of the exons spliced together is exported to the cytoplasm where it is translated. [30] Figure 2.28: Intron removal, processing, and transcription take place at the same site. RNA tracks can be seen in the nucleus of a eukaryotic cell. Fluorescent tags were attached to DNA (red) and RNA (green). Transcribed RNA does not disperse; rather, it accumulates near the site of synthesis and follows a defined track during processing. Figure 2.27: RNA splicing takes place within the spliceosome. [31] Although splicing is illustrated in Figure 2.27 as a two-step process, the reactions are in fact coordinated within the spliceosome. A key feature of the spliceosome is a series of interactions between the mRNA and snRNAs and between different snRNAs (summarized in Table 2.1). These interactions depend on complementary base pairing between the different RNA molecules and bring the essential components of the premRNA transcript and the spliceosome close together, which makes splicing possible. The spliceosome is assembled on the pre-mRNA transcript in a step-by-step fashion (Figure 2.27). First, snRNP U1 attaches to the 5' splice site, and then U2 attaches to the branch point. A complex consisting of U5 and U4–U6 (which form a single snRNP) joins the spliceosome. At this point, the intron loops over and the 5' splice site is brought close to the branch point. U1 and U4 disassociate from the spliceosome. The 5' splice site, 3' splice site, and branch point are in close proximity, held together by the spliceosome. The two transesterification reactions take place, joining the two exons together and releasing the intron as a lariat. An animation of the splicing process nuclear organization RNA splicing takes place in the nucleus and must occur before the RNA can move into the cytoplasm. For many years, the nucleus was viewed as a biochemical soup, in which components such as the spliceosome diffused and reacted randomly. Figure 2.28: Group I introns undergo self-splicing. (a) Secondary structure of a group I intron. (b) Self-splicing of a group I intron. [32] Figure 2.29: Group II introns undergo self-splicing by a different mechanism from that for group I introns. (a) Secondary structure of a group II intron. (b) Self-splicing of group II introns, which is similar to the splicing of nuclear introns. Conclusion: Both alternate splicing and multiple 3‘ cleavage sites produce different mRNAs from a single pre-mRNA. Figure 2.30: Eukaryotic cells have alternative pathways for processing pre-mRNA. (a) With alternative splicing; pre-mRNA can be spliced in different ways to produce different mRNAs. (b) With multiple 39 cleavage sites, there are two or more potential sites for cleavage and polyadenylation; use of the different sites produces mRNAs of different lengths. [33] Now, the nucleus is believed to have a highly ordered internal structure, with transcription and RNA processing taking place at particular locations within it. By attaching fluorescent tags to pre-mRNA and using special imaging techniques, researchers have been able to observe the location of pre-mRNA as it is transcribed and processed. The results of these studies revealed that intron removal and other processing reactions take place at the same sites as those of transcription (Figure 2.28), suggesting that these processes may be physically coupled. This suggestion is supported by the observation that part of RNA polymerase II is also required for the splicing and 3' processing of pre-mRNA. Self-Splicing Introns - Some introns are self-splicing, meaning that they possess the ability to remove themselves from an RNA molecule. These self-splicing introns fall into two major categories. Group I introns are found in a variety of genes, including some rRNA genes in protists, some mitochondrial genes in fungi, and even some bacteriophage genes. Although the lengths of group I introns vary, all of them fold into a common secondary structure with nine looped stems (Figure 2.28), which are necessary for splicing. Transesterification reactions are required for the splicing of group I introns (Figure 2.29). Alternative Processing Pathways - Another finding that complicates the view of a gene as a sequence of nucleotides that specifies the amino acid sequence of a protein is the existence of alternative processing pathways, in which a single pre-mRNA is processed in different ways to produce alternative types of mRNA, resulting in the production of different proteins from the same DNA sequence. One type of alternative processing is alternative splicing, in which the same pre-mRNA can be spliced in more than one way to yield multiple mRNAs that are translated into proteins with different amino acid sequences (Figure 2.30a). Another type of alternative processing requires the use of multiple 3' cleavage sites (Figure 2.30b); two or more potential sites for cleavage and polyadenylation are present in the pre-mRNA. In our example, cleavage at the first site produces a relatively short mRNA, compared with the mRNAs produced through cleavage at other sites. Both alternative splicing and multiple 3' cleavage sites can exist in the same pre-mRNA transcript; an example is seen in the mammalian calcitonin gene, which contains six exons and five introns (Figure 2.31a). The entire gene is transcribed into pre-mRNA (Figure 2.31b). There are two possible 3' cleavage sites. In cells of the thyroid gland, 3' cleavage and polyadenylation take place after the fourth exon, and the first three introns are then removed to produce a mature mRNA consisting of exons 1, 2, 3, and 4 (Figure 2.31c). This mRNA is translated into the hormone calcitonin. In brain cells, the identical pre-RNA is transcribed from DNA, but it is processed differently. Cleavage and polyadenylation take place after the sixth exon, yielding an initial transcript that includes all six exons. During splicing, exon 4 (part of the calcitonin mRNA) is removed, along with all the introns; so only exons 1, 2, 3, 5, and 6 are present in the mature mRNA (Figure 2.31d). When translated, this mRNA produces a protein called calcitonin-generelated peptide (CGRP), which has an amino acid sequence quite different from that of calcitonin. [34] Figure 2.31: Pre-mRNA encoded by the calcitonin gene undergoes alternative processing. Alternative splicing may produce different combinations of exons in the mRNA, but the order of the exons is not usually changed. Different processing pathways contribute to gene regulation. 2.3.5 Regulation of Gene Expression in Prokaryotes and Eukaryotes While the period from 1900 to the Second World War has been called the "golden age of genetics", we may be in a new golden (or platinum) age. Recombinant DNA technology allows us to manipulate the very DNA of living Figure 2.32: Partial gene map of the operons, such as trp and lac, on a bacterial chromosome. Image from Purves et al., Life: The Science of Biology, 4th Edition, by Sinauer Associates (http://www.sinauer.com/) and WH Freeman (http://www.whfreeman.com/), used with permission. [35] organisms and to make conscious changes in that DNA. Prokaryote genetic systems are much easier to study and better understood than are eukaryote systems. Gene Regulation in Prokaryotes (1) In Bacteria - The single chromosome of the common intestinal bacterium E. coli is circular and contains some 4.7 million base pairs. It is nearly 1 mm long, but only 2nm wide (Figure 2.32). The chromosome replicates in a bidirectional method, producing a figure resembling the Greek letter theta. The promoter is the part of the DNA to which the RNA polymerase binds before opening the segment of the DNA to be transcribed. A segment of the DNA that codes for a specific polypeptide is known as a structural gene. These often occur together on a bacterial chromosome. The location of the polypeptides, which may be enzymes involved in a biochemical pathway, for example, allows for quick, efficient transcription of the mRNAs. Often leader and trailer sequences, which are not translated, occur at the beginning and end of the region. E. coli can synthesize 1700 enzymes. Therefore, this small bacterium has the genes for 1700 different mRNAs. Lactose, milk sugar, is split by the enzyme β-galactosidase. This enzyme is inducible, since it occurs in large quantities only when lactose, the substrate on which it operates, is present. Conversely, the enzymes for the amino acid tryptophan are produced continuously in growing cells unless tryptophan is present. If tryptophan is present the production of tryptophan-synthesizing enzymes is repressed. The Operon Model - The operon model (Figure 2.33) of prokaryotic gene regulation was proposed by Fancois Jacob and Jacques Monod. Groups of genes coding for related proteins are arranged in units known as operons. An operon consists of an operator, promoter, regulator, and structural genes. The regulator gene codes for a repressor protein that binds to the operator, obstructing the promoter (thus, transcription) of the structural genes. The regulator does not have to be adjacent to other genes in the operon. If the repressor protein is removed, transcription may occur. Figure 2.33 : The lactose operon [36] Operons are either inducible or repressible according to the control mechanism. Seventyfive different operons controlling 250 structural genes have been identified for E. coli. Both repression and induction are examples of negative control since the repressor proteins turn off transcription. Bacteria do not make all the proteins that they are capable of making all of the time. Rather, they can adapt to their environment and make only those gene products that are essential for them to survive in a particular environment. For example, bacteria do not synthesize the enzymes needed to make tryptophan when there is an abundant supply of tryptophan in the environment. However, when tryptophan is absent from the environment the enzymes are made. Similarly, just because a bacterium has a gene for resistance to an antibiotic does not mean that that gene will be Figure 2.34 : Transcription of lac genes in the presence and expressed. The resistance gene absence of glucose may only be expressed when the antibiotic is present in the environment. Bacteria usually control gene expression by regulating the level of mRNA transcription. In bacteria, genes with related function are generally located adjacent to each other and they are regulated coordinately (i.e. when one is expressed, they all are expressed). Coordinate regulation of clustered genes is accomplished by regulating the production of a polycistronic mRNA (i.e. a large mRNA containing the information for several genes). Thus, bacteria are able to "sense" their environment and express the appropriate set of genes needed for that environment by regulating transcription of those genes. (A). INDUCIBLE GENES - THE OPERON MODEL 1. Definition An inducible gene is a gene that is expressed in the presence of a substance (an inducer) in the environment. This substance can control the expression of one or more genes (structural genes) involved in the metabolism of that substance. For example, lactose induces the expression of the lac genes that are involved in lactose metabolism. An [37] certain antibiotic may induce the expression of a gene that leads to resistance to that antibiotic. Induction is common in metabolic pathways that result in the catabolism of a substance and the inducer is normally the substrate for the pathway. 2. Lactose Operon a. Structural genes - The lactose operon (Figure 2.33) contains three structural genes that code for enzymes involved in lactose metabolism. The lac z gene codes for β-galactosidase, an enzyme that breaks down lactose into glucose and galactose The lac y gene codes for a permease, which is involved in uptake of lactose The lac a gene codes for a galactose transacetylase. These genes are transcribed from a common promoter into a polycistronic mRNA, which is translated to yield the three enzymes. b. Regulatory gene - The expression of the structural genes is not only influenced by the presence or absence of the inducer, it is also controlled by a specific regulatory gene. The regulatory gene may be next to or far from the genes that are being regulated. The regulatory gene codes for a specific protein product called a REPRESSOR. c. Operator - The repressor acts by binding to a specific region of the DNA called the operator which is adjacent to the structural genes being regulated. The structural genes together with the operator region and the promoter is called an OPERON. However, the binding of the repressor to the operator is prevented by the inducer and the inducer can also remove repressor that has already bound to the operator. Thus, in the presence of the inducer the repressor is inactive and does not bind to the operator, resulting in transcription of the structural genes. In Figure 2.35 : Effect of glucose on expression of proteins encoded by the lac operon [38] contrast, in the absence of inducer the repressor is active and binds to the operator, resulting in inhibition of transcription of the structural genes. This kind of control is referred to a NEGATIVE CONTROL since the function of the regulatory gene product (repressor) is to turn off transcription of the structural genes. d. Inducer - Transcription of the lac genes is influenced by the presence or absence of an inducer (lactose or other β-galactosides) (Figure 2.34). e.g:- + inducer = expression and - inducer = no expression 3. Catabolite repression (Glucose Effect) Many inducible operons are not only controlled by their respective inducers and regulatory genes, but they are also controlled by the level of glucose in the environment. The ability of glucose to control the expression of a number of different inducible operons is called CATABOLITE REPRESSION. Catabolite repression is generally seen in those operons which are involved in the degradation of compounds used as a source of energy. Since glucose is the preferred energy source in bacteria, the ability of glucose to regulate the expression of other operons ensures that bacteria will utilize glucose before any other carbon source as a source of energy. Mechanism - There is an inverse relationship between glucose levels and cyclic AMP (cAMP) levels in bacteria. When glucose levels are high cAMP levels are low and when glucose levels are low cAMP levels are high. This relationship exists because the transport of glucose into the cell inhibits the enzyme adenyl cyclase which produces cAMP. In the bacterial cell cAMP binds to a cAMP binding protein called CAP or CRP. The cAMP-CAP complex, but not free CAP protein, binds to a site in the promoters of catabolite repression-sensitive operons. The binding of the complex results in a more efficient promoter and thus more initiations of transcriptions from that promoter as illustrated in Figures 2.35 and 2.36. Since the role of the CAP-cAMP complex is to turn on transcription this type of control is said to be POSITIVE CONTROL. The consequences of this type of control is that to achieve maximal expression of a catabolite repression sensitive operon glucose must be absent from the environment and the inducer of the operon must be present. If both are present, the operon will not be maximally expressed until glucose is metabolized. Obviously, no expression of the operon will occur unless the inducer is present. (B). REPRESSIBLE GENES - THE OPERON MODEL 1. Definition Repressible genes are those in which the presence of a substance (a co-repressor) in the environment turns off the expression of those genes (structural genes) involved in the metabolism of that substance. e.g., Tryptophan represses the expression of the trp genes. Repression is common in metabolic pathways that result in the biosynthesis of a substance and the co-repressor is normally the end product of the pathway being regulated. [39] 2. Tryptophan operon a. Structural genes - The tryptophan operon (Figure 2.37) contains five structural genes that code for enzymes involved in the synthesis of tryptophan. These genes are transcribed from a common promoter into a polycistronic mRNA, which is translated to yield the five enzymes. Figure 2.36 : Effect of glucose on expression of proteins encoded by the lac operon b. Regulatory gene The expression of the structural genes is not only influenced by the presence or absence of the co-repressor, it is also controlled by a specific regulatory gene. The regulatory gene may be next to or far from the genes that are being regulated. The regulatory gene codes for a specific protein product called a REPRESSOR (sometimes called an apo-repressor). When the repressor is synthesized it is inactive. However, it can be activated by complexing with the co-repressor (i.e. tryptophan). c. Operator The active repressor / co-repressor complex acts by binding to a specific region of the DNA called the operator which is adjacent to the structural genes being regulated. The structural Figure 2.37 : The tryptophan operon [40] genes together with the operator region and the promoter is called an OPERON. Thus, in the presence of the corepressor the repressor is active and binds to the operator, resulting in repression of transcription of the structural genes. In contrast, in the absence of co-repressor the repressor is inactive and does not bind to the operator, resulting in transcription of the structural genes. This kind of control is referred to a NEGATIVE CONTROL since the function of the regulatory gene product (repressor) is to turn off transcription of the structural genes. d. Co-repressor Transcription of the Figure 2.38 : The effect of tryptophan on express ion from the tryp operon tryptophan genes is influenced by the presence or absence of a co-repressor (tryptophan) (Figure 2.38). e.g. :- + co-repressor = no expression & - co-repressor = expression 3. Attenuation In many repressible operons, transcription that initiates at the promoter can terminate prematurely in a leader region that precedes the first structural gene. (i.e. the polymerase terminates Figure 2.39 : Mechanism of attenuation [41] transcription before it gets to the first gene in the operon). This phenomenon is called ATTENUATION; the premature termination of transcription. Although attenuation is seen in a number of operons, the mechanism is best understood in those repressible operons involved in amino acid biosynthesis. In these instances attenuation is regulated by the availability of the cognate aminoacylated t-RNA. Mechanism (See Figure 2.39) - When transcription is initiated at the promoter, it actually starts before the first structural gene and a leader transcript is made. This leader region contains a start and a stop signal for protein synthesis. Since bacteria do not have a nuclear membrane, transcription and translation can occur simultaneously. Thus, a short peptide can be made while the RNA polymerase is transcribing the leader region. The test peptide contains several tryptophan residues in the middle of the peptide. Thus, if there is a sufficient amount of tryptophanyl-t-RNA to translate that test peptide, the entire peptide will be made and the ribosome will reach the stop signal. If, on the other hand, there is not enough tryptophanyl-t-RNA to translate the peptide, the ribosome will be arrested at the two tryptophan codons before it gets to the stop signal. The sequence in the leader m-RNA contains four regions, which have complementary sequences (Figure 2.40). Thus, several different secondary stem and loop structures can be formed. Region 1 can only form base pairs with region 2; region 2 can form base pairs with either region 1 or 3; region 3 can form base pairs with region 2 or 4; and region 4 can only form base pairs with region 3. Thus three possible stem/loop structures can be formed in the RNA. region 1:region 2 region 2:region 3 region 3:region 4 One of the possible structures (region 3 base pairing with region 4) generates a signal for RNA polymerase to terminate transcription (i.e. to attenuate transcription). However, the formation of one stem and loop structure can preclude the formation of others. If region 2 forms base pairs with region 1 it is not available to base pair with region 3. Similarly if region 3 forms base pairs with region 2 it is not available to base pair with region 4. The ability of the ribosomes to translate the test peptide will affect the formation of the various stem and loop structures Figure 2.41. If the ribosome reaches the stop signal for translation it will be covering up region 2 and thus region 2 will not available for forming base pairs with other regions. This allows the generation of the transcription termination signal because region 3 will be available to pair with region 4. Thus, when there is enough tryptophanyl-t-RNA to translate the test peptide attenuation will occur and the structural genes will not be Figure 2.40 : Formation of stem-loops transcribed. In contrast, when there is an insufficient amount of tryptophanyl-t-RNA to translate the test peptide no attenuation [42] will occur. This is because the ribosome will stop at the two tryptophan codons in region 1, thereby allowing region 2 to base pair with region 3 and preventing the formation of the attenuation signal (i.e. region 3 base paired with region 4). Thus, the structural genes will be transcribed. Figure 2.41 : Mechanism of atteunation (2) In Viruses Viruses consist of a nucleic acid (DNA or RNA) enclosed in a protein coat (known as a capsid). The capsid may be a single protein repeated over and over, as in tobacco mosaic virus (TMV). It may also be several different proteins, as in the T-even bacteriophages. Once inside the cell, the nucleic acid follows one of two paths: lytic or lysogenic. Retroviruses, such as Human Immuno-difficiency Virus (HIV), also include the enzyme reverse transcriptase with the viral RNA. Reverse transcriptase makes a single-stranded viral DNA copy of the single-stranded viral RNA. The single stranded viral DNA is subsequently turned into a double-stranded DNA. The lytic cycle occurs when the viral DNA immediately takes over the host cell (remember that viruses are obligate intracellular parasites) and begins making new viruses. Eventually the new viruses cause the rupture (or lysis) of the cell, releasing those new viruses to continue the infection cycle. The lysogenic cycle occurs when the viral DNA is incorporated into the host DNA as a prophage. When the cell replicates the prophage is passed along as if it were host DNA. Sometimes the prophage can emerge from the host chromosome and enter the lytic cycle spontaneously once every 10,000 cell divisions. Ultraviolet light and x-rays may also trigger emergence of the prophage. Transduction is the transfer of host DNA from one cell to another by a virus (Figure 2.42). Some bacteriophages are temperate since they tend to go lysogenic rather than lytic. These types of viruses are able to transduce fragments of the host DNA. [43] Transposons are DNA fragments incorporated into the chromosomal DNA (Figure 2.43). Unlike episomes and prophages, transposons contain a gene producing an enzyme that catalyzes insertion of the transposon at a new site. They also have repeated sequences 2040 nucleotides in length at each end. Insertion sequences are short (600-1500 base pairs long) simple transposons that do not carry genes beyond those essential for insertion of the transposon into E. coli. Complex transposons are much larger 2.42 : and carry Figure Induction of additional by genes. Genes transduction viruses in bacteria. incorporated Images from Purves in a complex et al., Life: The transposon are Science of Biology, known as 4th Edition, by Associates jumping genes Sinauer (http://www.sinauer. since they can com/) and WH move about Freeman on the (http://www.whfreem an.com/), used with chromosome (even from permission. chromosome to chromosome). Often the complex transposons are flanked by simple transposons. Gene Regulation in Eukaryotes In the absence of precise information about the mechanisms that regulate gene expression in eukaryotes, many models were proposed. One of the more popular early models known as Britten Davidson model or Gene Battery model was that given by R.J. Britten and E.H. Davidson in 1969. This model even though widely accepted, is only a [44] theoretical model and lacks sound practical proof. The model predicts the presence of four types of sequences. Producer gene - It is comparable to a structural gene in prokaryotes. It produces pre mRNA, which after processing becomes mRNA. Its expression is under the control of many receptor sites. Receptor site (gene) - It is comparable to the operator in bacterial operon. At least one such receptor site is assumed to be present adjacent to each producer gene. A specific receptor site is activated when a specific activator RNA or an activator protein, a product of integrator gene, complexes with it. Integrator gene - Integrator gene is comparable to regulator gene and is responsible for the synthesis of an activator RNA molecule that may not give rise to proteins before it activates the receptor site. At least one integrator gene is present adjacent to each sensor site. Sensor site - A sensor site regulates activity of an integrator gene which can be transcribed only when the sensor site is activated. The sensor sites are also regulatory sequences that are recognized by external stimuli, e.g. hormones, temperature. According to the Britten Davidson model, specific sensor genes represent sequence-specific binding sites (similar to CAP-cAMP binding site in the E. coil) that respond to a specific signal. When sensor genes receive the appropriate signals, they activate the transcription of the adjacent integrator genes. The integrator gene products will then interact in a sequence specific manner with receptor genes. Britten and Davidson proposed that the integrator gene products are activator RNAs that interact directly with the receptor genes to trigger the transcription of the continuous producer genes. It is also proposed that receptor sites and integrator genes may be repeated a number of times so as to control the activity of a large number of genes in the same cell. Repetition of receptor ensures that the same activator recognizes all of them and in this way several enzymes of one metabolic pathway are simultaneously synthesized. Transcription of the same gene may be needed in different developmental stages. This is achieved by the multiplicity of receptor sites and integrator genes. Each producer gene may have several receptor sites, each responding to one activator. Thus, though a single activator can recognize several genes, different activators may activate the same gene at different times. A set of structural genes controlled by one sensor site is termed as a battery. Sometimes when major changes are needed, it is necessary to activate several sets of genes. If one sensor site is associated with several integrators, it may cause transcription of all integrators simultaneously thus causing transcription of several producer genes through receptor sites. The repetition of integrator genes and receptor sites is consistent with the reports that state that sufficient repeated DNA occurs in the eukaryotic cells. The most attractive features of the Britten and Davidson model is that it provides a plausible reason for the [45] observed pattern of interspersion of moderately repetitive DNA sequences and single copy DNA sequences. Direct evidence indicates that most structural genes are indeed single copy DNA sequences. The adjacent moderately repetitive DNA sequences would contain the various kinds of regulator genes (sensor, integrator and receptor genes). The latest estimates are that a human cell, a eukaryotic cell, contains 20,000–25,000 genes. Some of these are expressed in all cells all the time. These so-called housekeeping genes are responsible for the routine metabolic functions (e.g. respiration) common to all cells. Some are expressed as a cell enters a particular pathway of differentiation. Some are expressed all the time in only those cells that have differentiated in a particular way. For example, a plasma cell expresses continuously the genes for the antibody it synthesizes. Some are expressed only as conditions around and in the cell change. For example, the arrival of a hormone may turn on (or off) certain genes in that cell. Figure 2.43 : Transposons and their relationship to other genes. Image from Purves et al., Life: The Science of Biology, 4th Edition, by Sinauer Associates (http://www.sinauer.com/ ) and WH Freeman (http://www.whfreeman.c om/), used with permission. How is gene expression regulated? There are several methods used by eukaryotes. Altering the rate of transcription of the gene. This is the most important and widely-used strategy and the one we shall examine here. However, eukaryotes supplement transcriptional regulation with several other methods: o Altering the rate at which RNA transcripts are processed while still within the nucleus. o Altering the stability of mRNA molecules; that is, the rate at which they are degraded. o Altering the efficiency at which the ribosomes translate the mRNA into a polypeptide. [46] Protein-coding genes have exons whose sequence encodes the polypeptide; introns that will be removed from the mRNA before it is translated; a transcription start site a promoter o the basal or core promoter located within about 40 bp of the start site o an "upstream" promoter, which may extend over as many as 200 bp farther upstream enhancers silencers Adjacent genes (RNA-coding as well as protein-coding) are often separated by an insulator which helps them avoid cross-talk between each other's promoters and enhancers (and/or silencers). Transcription start site This is where a molecule of RNA polymerase II (pol II, also known as RNAP II) binds. Pol II is a complex of 12 different proteins (shown in the figure in yellow with small colored circles superimposed on it). The start site is where transcription of the gene into RNA begins. Figure 2.44 : Eukaryotic promoter with TFIID The basal promoter The basal promoter (Figure 2.44) contains a sequence of 7 bases (TATAAAA) called the TATA box. It is bound by a large complex of some 50 different proteins, including Transcription Factor IID (TFIID) which is a complex of o TATA-binding protein (TBP), which recognizes and binds to the TATA box o 14 other protein factors which bind to TBP — and each other — but not to the DNA. Transcription Factor IIB (TFIIB) which binds both the DNA and pol II. Figure 2.45 : Eukaryotic promoter with Enhancer Binding Protein [47] The basal or core promoter is found in all protein-coding genes. This is in sharp contrast to the upstream promoter whose structure and associated binding factors differ from gene to gene. Although the figure is drawn as a straight line, the binding of transcription factors to each other probably draws the DNA of the promoter into a loop. Many different genes and many different types of cells share the same transcription factors - not only those that bind at the basal promoter but even some of those that bind upstream (Figure 2.45). What turns on a particular gene in a particular cell is probably the unique combination of promoter sites and the transcription factors that are chosen. An Analogy The rows of lock boxes in a bank provide a useful analogy. To open any particular box in the room requires two keys: your key, whose pattern of notches fits only the lock of the box assigned to you (= the upstream promoter), but which cannot unlock the box without a key carried by a bank employee that can activate the unlocking mechanism of any box (= the basal promoter) but cannot by itself open any box. Note : Transcription factors represent only a small fraction of the proteins in a cell. Hormones exert many of their effects by forming transcription factors - The complexes of hormones with their receptor represent one class of transcription factor. Hormone "response elements", to which the complex binds, are promoter sites. Embryonic development requires the coordinated production and distribution of transcription factors. Enhancers Some transcription factors ("Enhancer-binding protein") bind to regions of DNA that are thousands of base pairs away from the gene they control (Figure 2.46). Binding increases the rate of transcription of the gene. Enhancers can be located upstream, downstream, or even within the gene they control. How does the binding of a protein to an enhancer regulate the transcription of a gene thousands of base pairs away? One possibility is that enhancer-binding proteins — in addition to their DNA-binding site, have sites that bind to transcription factors ("TF") assembled at the promoter of the gene. This would draw the DNA into a loop (as shown in the figure 2.46). Figure 2.46 : Some of the transcription factors that produce the segmented body plan in Drosophila. E2 and Sp1 type of Binding Proteins. Visual evidence Michael R. Botchan (who kindly [48] supplied these electron micrographs) and his colleagues have produced visual evidence of this model of enhancer action. They created an artificial DNA molecule with several promoter sites for Sp1 about 300 bases from one end. Sp1 is a zinc-finger transcription factor that binds to the sequence 5' GGGCGG 3' found in the promoters of many genes, especially "housekeeping" genes. several enhancer sites about 800 bases from the other end. These are bound by an enhancer-binding protein designated E2. 1860 base pairs of DNA between the two. When these DNA molecules were added to a mixture of Sp1 and E2, the electron microscope showed that the DNA was drawn into loops with "tails" of approximately 300 and 800 base pairs. At the neck of each loop were two distinguishable globs of material, one representing Sp1 (red), the other E2 (blue) molecules. (The two micrographs are identical; the lower one has been labeled to show the interpretation.) Artificial DNA molecules lacking either the promoter sites or the enhancer sites, or with mutated versions of them, failed to form loops when mixed with the two proteins. Silencers Silencers are control regions of DNA that, like enhancers, may be located thousands of base pairs away from the gene they control. However, when transcription factors bind to them, expression of the gene they control is repressed. Insulators A problem: As you can see above, enhancers can turn on promoters of genes located thousands of base pairs away. What is to prevent an enhancer from inappropriately binding to and activating the promoter of some other gene in the same region of the chromosome? One answer: an insulator. Insulators are Figure 2.47 : Chromosome 14 showing δ and stretches of DNA (as few as 42 α gene segments with promoter and base pairs may do the trick) enhancer. located between the o enhancer(s) and promoter or o silencer(s) and promoter of adjacent genes or clusters of adjacent genes. Their function is to prevent a gene from being influenced by the activation (or repression) of its neighbors. Example: The enhancer for the promoter of the gene for the delta chain of the gamma/delta T-cell receptor for antigen (TCR) is located close to the promoter for the alpha chain of the alpha/beta TCR (on chromosome 14 in humans) (Figure 2.47). A T cell [49] must choose between one or the other. There is an insulator between the alpha gene promoter and the delta gene promoter that ensures that activation of one does not spread over to the other. All insulators discovered so far in vertebrates work only when bound by a protein designated CTCF ("CCCTC binding factor"; named for a nucleotide sequence found in all insulators). CTCF has 11 zinc fingers. Another example: In mammals (mice, humans, pigs), only the allele for insulin-like growth factor-2 (IGF2) inherited from one's father is active; that inherited from the mother is not — a phenomenon called imprinting. The mechanism: the mother's allele has an insulator between the IGF2 promoter and enhancer. So does the father's allele, but in his case, the insulator has been methylated. CTCF can no longer bind to the insulator, and so the enhancer is now free to turn on the father's IGF2 promoter. Many of the commercially-important varieties of pigs have been bred to contain a gene that increases the ratio of skeletal muscle to fat. This gene has been sequenced and turns out to be an allele of IGF2, which contains a single point mutation in one of its introns. Pigs with this mutation produce higher levels of IGF2 mRNA in their skeletal muscles (but not in their liver). This tells us that: Mutations need not be in the protein-coding portion of a gene in order to affect the phenotype. Mutations in non-coding portions of a gene can affect how that gene is regulated (here, a change in muscle but not in liver). 2.4 LET US SUM UP To map phage genes, bacterial cells are infected with viruses that differ in two or more genes. Recombinant plaques are counted, and rates of recombination are used to determine the linear order of the genes on the chromosome and the distance between them Self-splicing introns are of two types: group I introns and group II introns. These introns have complex secondary structures that enable them to catalyze their excision from RNA molecules without the aid of enzymes or other proteins. Intron splicing of nuclear genes is a two-step process: (1) the 5' end of the intron is cleaved and attached to the branch point to form a lariat and (2) the 3' end of the intron is cleaved and the two ends of the exon are spliced together. These reactions take place within the spliceosome. Alternative splicing enables exons to be spliced together in different combinations to yield mRNAs that encode different proteins. Alternative 3' cleavage sites allow pre-mRNA to be cleaved at different sites to produce mRNAs of different lengths. [50] 5.5 CHECK YOUR PROGRESS NOTE: (1) Write your answer in the space given below. (2) Compare your answer with the ones given at the end of the unit. ( א1) Fill in the blanks : (a) ….. and its close relative ……. are viruses that infect the bacterium E. coli. (b) The strain B of E. coli can be infected by both ….. and …… strains of T2 bacteriophage. (c) When the lambda virus enters a cell, a virus-encoded enzyme called ………………… is synthesized. (d) Plasmids are best thought of as ………………………………... (e) A zygote inherits nuclear genes from both parents, but typically all of its cytoplasmic organelles, and thus all its cytoplasmic genes, come from …………………. of the gametes, usually the ……………. ( א2) Write the answer of following questions : (a) What are Techniques for the study of Bacteriophages genome? (b) Explain the Genetic Transformation, Conjugation and Transduction in Bacteria? 5.6 CHECK YOUR PROGRESS: THE KEY ( א1) (a) T2, T4 (d) mini-chromosomes (b) h+, h (c) lambda integrase (e) only one of the gametes, egg ( א2) (a) see section 2.2.2 (b) see section 2.2.4 [51] 2.7 ASSIGNMENT Make a project explaining Gene Regulation in Filamentous Fungi. 2.8 REFERENCES Our courteous thanks to following two authors/publishers for preparing the various section of this chapter:B. Alberts et al., ‘Molecular Biology of the Cell’: 4th Ed. (2002). Garland. Benzamin A. Pierce, ‘Genetics : A Coneptual Approach’ Other helping resources are as follows:Lewin. Genes VII. (2000). Oxford University Press. C. R. Calladine and H. R. Drew. Understanding DNA: The Molecule and How It Works. 2nd edn (1997). Academic Press. (3rd Ed. due in 2004). Jeremy Dale and Simon F. Park. Molecular Genetics of Bacteria, 4th Edition 2004. John Wiley & Sons, Ltd H. Lodish et al. Molecular Cell Biology, 4th Ed. (1995). W. H. Freeman. (5th Ed due in 2003–2004). L. Snyder and W. Champness (2003). Molecular Genetics of Bacteria, 2nd Ed. American Society for Microbiology. S. Baumberg (Ed.) (1999). Prokaryotic Gene Expression. M. T. Madigan, J. M. Martinko and J. Parker (2000). Biology of Microorganisms (better known as ‘Brock’), 9th Ed. Prentice Hall International. J. W. Dale and M. von Schantz (2002). From Genes to Genomes. John Wiley & Sons. T. A. Brown (2001). Gene Cloning – An Introduction, 4th Ed. Blackwell Science. S. B. Primrose, R. Twyman and R. W. Old (2001). Principles of Gene Manipulation, 6th Ed. Blackwell Science. B. R. Glick (2003). Molecular Biotechnology: Principles and Applications of Recombinant DNA, 3rd Ed. American Society for Microbiology. D. P. Snustad and M. J. Simmons (2000). Principles of Genetics, 2nd Ed. John Wiley. W. S. Klug and M. R. Cummings (2000). Concepts of Genetics, 6th Ed. Prentice Hall. L. H. Hartwell and others (2000). Genetics. McGraw-Hill. P. J. Russell (2002). Genetics. Benjamin Cummings. A. J. F. Griffiths, W. M. Gelbart, R. C. Lewontin and J. H. Miller (2002). Modern Genetic Analysis, 2nd Ed. W. H. Freeman. R. W. Hendrix et al (1983). Lambda II. M. Wilson, R. McNab and B. Henderson (2002). Bacterial Disease Mechanisms. Cambridge University Press. [52] W. Hayes (1968). The Genetics of Bacteria and their Viruses, 2nd Ed. Blackwell Scientific Publications. Websites which give more information on bacterial genome sequences and access to genomic data bases are as follows:http://www.sanger.ac.uk/ http://www.tigr.org/ http://www.ncbi.nlm.nih.gov/ ****** [53] UNIT-3 GENETIC MAPPING RECOMBINATION AND Structure 3.0 Introduction 3.1 Objectives 3.2 Recombination: 3.2.1 Independent Assortment and Crossing Over 3.2.2 Molecular Mechanism of Recombination 3.2.3 Role of RecA and RecBCD enzymes 3.2.4 Site Specific Recombination 3.2.5 Chromosome Mapping Linkage Groups and Genetic Markers 3.2.6 Construction of Molecular Maps 3.2.7 Correlation of Genetic and Physical Maps 3.2.8 Somatic Cell Genetics – An Alternative Approach to Gene Mapping 3.3 Mutations: 3.2.1 Spontaneous and Induced Mutations 3.2.2 Physical and Chemical Mutagens 3.2.3 Molecular Basis of Gene Mutations 3.2.4 Transposable Elements in Mutagenesis 3.2.5 DNA Damage and Repair Mechanism 3.2.6 Inherited Human Diseases and Defects in DNA repair 3.2.7 Initiation of Cancer at Cellular Level 3.2.8 Proto-oncogenes and Oncogenes 3.4 Let Us Sum Up 3.5 Check Your Progress 3.6 Check Your Progress: The Key 3.7 Assignment 3.8 References 3.1 INTRODUCTION In the two preceding sections we discussed the mechanisms by which DNA sequences in cells are maintained from generation to generation with very little change. Although such genetic stability is crucial for the survival of individuals, in the longer term the survival of organisms may depend on genetic variation, through which they can adapt to a changing environment. Thus an important property of the DNA in cells is its ability to undergo rearrangements that can vary the particular combination of genes present in any individual genome, as well as the timing and the level of expression of these genes. These DNA rearrangements are caused by genetic recombination. Two broad classes of genetic recombination are commonly recognized : – [54] (a) General recombination and (b) Site-specific recombination. In general recombination, genetic exchange takes place between any pair of homologous DNA sequences, usually located on two copies of the same chromosome. One of the most important examples is the exchange of sections of homologous chromosomes (homologues) in the course of meiosis. This "crossing-over" occurs between tightly apposed chromosomes early in the development of eggs and sperm and it allows different versions (alleles) of the same gene to be tested in new combinations with other genes, increasing the chance that at least some members of a mating population will survive in a changing environment. Although meiosis occurs only in eukaryotes, the advantage of this type of gene mixing is so great that mating and the reassortment of genes by general recombination are also widespread in bacteria. This process leads to offspring having different combinations of genes from their parents and can produce new chimeric alleles. Enzymes called recombinases catalyze natural recombination reactions. RecA, the recombinase found in E. coli, is responsible for the repair of DNA double strand breaks (DSBs). In yeast and other eukaryotic organisms there are two recombinases required for repairing DSBs. The RAD51 protein is required for mitotic and meiotic recombination and the DMC1 protein is specific to meiotic recombination. Chromosomal crossover refers to recombination between the paired chromosomes inherited from each of one's parents, generally occurring during meiosis. During prophase-I the four available chromatids are in tight formation with one another. While in this formation, homologous sites on two chromatids can mesh with one another, and may exchange genetic information. Because recombination can occur with small probability at any location along chromosome, the frequency of recombination between two locations depends on their distance. Therefore, for genes sufficiently distant on the same chromosome the amount of crossover is high enough to destroy the correlation between alleles. In gene conversion, a section of genetic material is copied from one chromosome to another, but leaves the donating chromosome unchanged. Recombination can occur between DNA sequences that contain no sequence homology. This is referred to as Nonhomologous recombination or Nonhomologous end joining. DNA homology is not required in site-specific recombination. Instead, exchange occurs at short, specific nucleotide sequences (on either one or both of the two participating DNA molecules) that are recognized by a variety of site-specific recombination enzymes. Site-specific recombination therefore alters the relative positions of nucleotide sequences in genomes. In some cases these changes are scheduled and organized, as when an integrated bacterial virus is induced to leave a chromosome of a bacterium under stress; in others they are haphazard, as when the DNA sequence of a transposable element is inserted at a randomly selected site in a chromosome. As for DNA replication, most of what we know about the biochemistry of genetic recombination has come from studies of prokaryotic organisms, especially of E. coli and its viruses. 3.1 OBJECTIVE [55] This unit set sights on recombination and mutation, both the process are responsible for the evolution of the organism. Following points will be covered for understanding of students: Medal’s law of independent assortment is based on the mechanism of recombination where other than wild type new type of progeny produce in F2 generation. General Recombination process is guided by basepairing interactions between complementary strands of two homologous DNA molecules where nick formation occur in one strand RecA and RecBCD are important enzyme of recombination which have been studied in E. coli. Site-specific recombination enzymes move special DNA sequences into and out of genomes. Different type of inducers like physical as well as chemical mutagens causes mutation. Variety of DNA repair mechanism found in organism to avoid the mutation. Figure 3.1: The genotypes of two independent traits Study of some severe human show a 9:3:3:1 ratio in the F2 generation. In this example, coat color is indicated by B (brown, disease developed due to dominant) or b (white) while tail length is indicated by disorders of genes. S (short, dominant) or s (long). When parents are Cellular imbalance lead to homozygous for each trait ('SSbb and ssBB), their the development of cancer children in the F1 generation are heterozygous at both where proto-oncogenes and loci and only show the dominant phenotypes. If the children mate with each other, in the F2 generation all oncogenes may also involve. 3.2 RECOMBINATION combination of coat color and tail length occur: 9 are brown/short (purple boxes), 3 are white/short (pink boxes), 3 are brown/long (blue boxes) and 1 is white/long (green box). 3.2.1 Independent Assortment and Crossing Over [56] Independent Assortment - The Law of Independent Assortment, also known as "Inheritance Law", states that the inheritance pattern of one trait will not affect the inheritance pattern of another. While Mendel's experiments with mixing one trait always resulted in a 3:1 ratio between dominant and recessive phenotypes, his experiments with mixing two traits (dihybrid cross) showed 9:3:3:1 ratios (Figure 3.1). But the 9:3:3:1 table shows that each of the two genes is independently inherited with a 3:1 ratio. Mendel concluded that different traits are inherited independently of each other, so that there is no relation, for example, between a cat's color and tail length. This is actually only true for genes that are not linked to each other. Independent assortment occurs during meiosis I in eukaryotic organisms, specifically anaphase I of meiosis, to produce a gamete with a mixture of the organism's maternal and paternal chromosomes. Along with chromosomal crossover, this process aids in increasing genetic diversity by producing novel genetic combinations. Of the 46 chromosomes in a normal diploid human cell, half are maternally-derived (from the mother's egg) and half are paternally-derived (from the father's sperm). This occurs as sexual reproduction involves the fusion of two haploid gametes (the egg and sperm) to produce a new organism having the full complement of chromosomes. During gametogenesis - the production of new gametes by an adult - the normal complement of 46 chromosomes needs to be halved to 23 to ensure that the resulting haploid gamete can join with another gamete to produce a diploid organism. An error in the number of chromosomes, such as those caused by a diploid gamete joining with a haploid gamete, is termed aneuploidy. In independent assortment the chromosomes that end up in a newly-formed gamete are randomly sorted from all possible combinations of maternal and paternal chromosomes. Because gametes end up with a random mix instead of a pre-defined "set" from either parent, gametes are therefore considered assorted independently. As such, the gamete can end up with any combination of paternal or maternal chromosomes. Any of the possible combinations of gametes formed from maternal and paternal chromosomes will occur with equal frequency. For human gametes, with 23 pairs of chromosomes, the number of possibilities is 223 or 8,388,608 possible combinations. The gametes will normally end up with 23 chromosomes, but the origin of any particular one will be randomly selected from paternal or maternal chromosomes. This contributes to the genetic variability of progeny. Crossing Over - Crossing over occurs between equivalent portions of two nonsister chromatids (Figure 3.2). Each chromatid contains a single molecule of DNA. So the problem of crossing over is really a problem of swapping portions of adjacent DNA molecules. It must be done with great precision so that neither chromatid gains or loses any genes. In fact, crossing over has to be sufficiently precise that not a single nucleotide is lost or added at the crossover point if it occurs within a gene. Otherwise a frameshift would result and the resulting gene would produce a defective product or, more likely, no product at all. How do nonsister chromatids ensure that crossing over between them will occur without the loss or gain of a single nucleotide? One plausible mechanism for which there is considerable laboratory evidence postulates the following events. [57] Note that each recombinant DNA molecule includes a region where nucleotides from one of the original molecules are paired with nucleotides from the other. But no matter, the need for a smooth double helix guarantees that each exchange takes places without any gain or loss of nucleotides. So long as the total number of nucleotides in each strand and the complementarities (A-T, CG) are preserved, this "heteroduplex" region (which may extend for hundreds of base pairs) will only rarely have genetic consequences. And these may, in fact, be helpful because the synthesis of a short stretch of DNA using the template provided by the other chromatid also provides a mechanism for repairing any damage that might have been present on the "invading" strand of DNA. If the cut in the molecule 1 occurs in the region of a mutation, the damaged or incorrect nucleotides can be digested away. Refilling the resulting gap, using the undamaged molecule 2 as the template, repairs the damage to molecule 1. Why should the cutting and ligation be limited to the strands shown? They are not. Half the time the cutting and Figure 3.2: Mechanism of Crossing Over ligating rejoins the original parental arms. In these cases, no crossover takes place. The only genetic change that might have occurred is a transfer of some genetic information in the heteroduplex region. So crossing over not only provides a mechanism for genetic recombination during meiosis but also provides a means of repairing damage to the genome. 3.2.2 Molecular mechanisms of Recombination DNA replication with 100% fidelity is a nice feature to keep offspring in just the genetic background of the species. But to get there, or to evolve further, requires genetical changes, one of which results from recombination of (near) homologous parts of DNA. The nature of structural changes in DNA neccessary to result in homologous genetic recombination were layed out by R. HOLLIDAY in 1964, and in subsequent years the crossover-structures were visualized by electron microscopy (Figure 3.3). The actual [58] conformation of a DNA crossover was speculated to be a four-way-junction with separate DNA helices, or with stacked helices in either a parallel or an antiparallel orientation of the helices. The models had to allow for branch migration, else no exchange of genetic material would happen. Figure 3.3: A ‘X’-form that has been prepared for the electron microscope in the presence of a high concentration of formamide. Under these conditions the DNA double helix is stressed, and those regions particulary rich in AT base pairs undergo a localized denaturation. This sequencespecific denaturation allows the homologous arms in the molecule to be identified. Furthermore, the covalent strand connections in the region of the crossover can be seen. In this and other 80 open molecules, the homologous arms are in a trans configuration (Photo: H. POTTER und D. DRESSLER, 1976). Breakage - Fusion (Reunion) - Bridge Cycle, Control Elements And Unstable Genes - Since the beginning of the century has it been known that unstable or variable gene loci occur in plants although the drastically enhanced mutability and the increased back mutation rate could not be explained at first. The decisive breakthrough was accomplished by B. McCLINTOCK with her studies on maize chromosomes published in 1947 and 1951. The basis of these were her earlier observations and analyses (1938) on breakage-fusion (reunion)-bridges. Their occurrence could be correlated with the restructuring in chromosomes. Bridges are formed during anaphase whenever two chromosomes fuse at their ends generating a fusion product with two centromers. If these two are subsequently torn to different poles then will inevitably occur a chromosomal fraction. During the following S-phase of the interphase nucleus is a chromatid with a fraction at its terminus replicated in just the same way as the other chromosomes leading again to a fusion of the homologous chromatids. Consequently can a chromosome consisting of just one chromatid but two centromers be found in the subsequent mitosis instead of a chromosome out of two chromatids and one centromer. The consequence is a second fraction during anaphase where the second round of the cycle starts. B. McCLINTOCK recognized that the fraction cannot occur at any site of the chromosome but is restricted to certain sections that she called Ds (dissociation). These were obviously DNA segments contributing to the formation of translocations, deletions, inversions and to the generation of ring-shaped chromosomes. The first fraction causes similar fractions in the mitosis cycles of following generations. They happen during ontogenesis at different times and sites. [59] The segment Ds, a mutator gene, behaves like a multiple allele (or, even better, like a pseudoallele) that can be located at different gene loci. It may also vary in structure. This mutator can insert itself into other genes thus rendering them inactive. It is a control element that changes its place within the chromosome, jumping or wandering around and causing mutations wherever it inserts (the mutators are also called jumping genes). It soon became clear that a further set of elements has to exist: the Ac (activation) elements. A chromosomal fraction or a translocation of a Ds element has to be supported by an Ac element. An Ac element can also be regarded as a multiple allele. It may occur at the most different sites in all chromosomes. To analyze its effect further concentrated B. McCLINTOCK on the study of genes that determine the colour of maize grains. One of the most important is the C-locus that causes a dark red staining of the aleuron layer and the pericarp of the maize grain in a dominant condition. If a Ds-element jumps into the gene, colour synthesis is interrupted and colourless (yellow) grains result. An Ac activity within these grains causes a pattern of dark red areas on a light ground. This is explained by a reestablishment of the old state since the Ac element removes the Ds from the C-locus. This happens in several cells during the development of the maize grain. These cells are the origin of the aleuron layer and the pericarp and the back mutation can only be perceived in the clones that form out of the changed cells. Today are a number of gene loci known that can be influenced by the Ds-Ac-system or other control elements. The detection of the spm-system (suppressor-mutator) and the elucidation of its function showed that the control elements do not only act as switches (a yes/ no decision) but that they do modulate the degree of gene expression, too. The genetic analyses of B. McCLINTOCK were not understood for years. Only when insertion elements and transposons were found in bacterial DNA during the late sixties did an analogy between them and the control elements show up. These genetic data fitted neatly with molecular biological models (B. NEVERS and H. SAEDLER, 1977, H-P. DÖRING and P. STARLINGER, 1984). Mrs BARBARA McCLINTOCK was awarded the Nobel prize for medicine and physiology for her pioneer achievements. P.NEVERS, N. S. SHEPHERD and H. SAEDLER listed the 'unstable plant genes' described in literature at the beginning of 1986. It shows that such genes have been found in more than 30 species. Many of the respective mutants with names like variegate, marmorata, maculata or variabilis are on the market as ornamental plants due to their irregularly spotted flowers or leaves. Holliday junction: central intermediate of genetic recombination - DNA replication with 100% fidelity is a nice feature to keep offspring in just the genetic background of the species. But to get there, or to evolve further, requires genetic changes, one of which results from recombination of (near) homologous parts of DNA. The nature of structural changes in DNA neccessary to result in homologous genetic recombination were layed out by Holliday in 1964, and in subsequent years the crossover-structures were visualized by electron microscopy. The actual conformation of a DNA crossover was speculated to be a four-way-junction with separate DNA helices, or with stacked helices in either a parallel or an antiparallel orientation of the helices. The models had to allow for branch migration, else no exchange of genetic material would happen. During branch migration hydrogen bonds between paired bases have to be broken and others reformed instead. On average the energy for braking and reforming these bonds [60] will cancel each other - but in real existing DNA not all base pairs are created equal. This calls for the action of enzymes to overcome the neccessary activation energy. And enzymes are needed anyway to resolve the four-way-junctions into separate helices. In E. coli. e.g. there exists an enzyme system (RuvABC) the components of which hold the Holliday junction (RuvA), swivel the DNA strands to enable branch migration (RuvB) and finally cut the junction (RuvC). A DNA ligase restores intact double helices. Homologous genetic recombination is a highly dynamic process, in contrast to X-ray crystallography relaying on static structures. So it took to the end of the previous millenium to get an atomic detail view of relevant structures. You may see here the structure of a four-way Holliday-junction formed by homologous DNA strands, a RuvAtetramer complexed to a static Holliday-junction, the motor driving branch migration, and a Holliday-junction resolving enzyme. The Ruv-System of E. coli is in itself a dynamic complex. During branch migration two tetramers of RuvA hover on both sides of a cruciform DNA, with multimeric RuvB clamping two of the DNA strands to wind them. This complex is not accessible for RuvC. In order for the resolvase to act, one of the RuvA tetramers has to be dissociated so that one side of the DNA junction is amenable to strand separation. In vitro the tetrameroctamer-equilibrium is subject to the salt concentration of the buffer. Conditions neccessary for crystallisation of the complex resulted in tetrameric RuvA complexed to the Figure 3.4: RecA protein-dsDNA complex DNA. imaged by atomic force microscopy (AFM): Our research is focused on the molecular mechanisms of genetic recombination, with the long-term objective being the reconstitution of in vitro systems that accurately reproduce the cellular processes. We are characterizing the biochemical properties of proteins essential to homologous recombination, in prokaryotes, eukaryotes, and Archaea. In E. coli, the RecA, RecBCD, RecQ, RuvABC, and SSB proteins, and a specific DNA sequence called Chi, are essential to homologous recombination. The RecA protein possesses the unique ability to pair homologous DNA molecules (Figure 3.4) and to promote the subsequent exchange of DNA strands. Since RecA protein is the prototypic DNA strand exchange protein, we are interested in the biochemical mechanism of protein-mediated recognition and exchange of homologous DNA strands. The RecBCD enzyme is both a DNA helicase and a nuclease with the remarkable properties that its nuclease activity, but not its helicase activity, is attenuated by interaction with the Chi sequence, and that it will actively load RecA protein onto ssDNA. RecQ protein is a helicase that can also effect recombination events. SSB protein is an ssDNA binding protein that stimulates the activities of RecA, RecBCD, and RecQ proteins by virtue of its [61] ability to bind ssDNA. Recently, we reconstituted an in vitro pairing reaction that requires the concerted action of each of these proteins; the role of each protein in this reaction is under investigation. We are also studying the biochemistry of homologous recombination in the yeast, S. cerevisiae and the archaeon, S. solfataricus. Rad51 and RadA proteins are the RecA protein homologues, respectively. In yeast, at least three ancillary proteins are needed for Rad51 protein-mediated DNA strand exchange: these include the RP-A, Rad52, and Rad54 proteins. We are studying the mechanism of these reconstituted reactions General Recombination Is Guided by Base-pairing Interactions Between Complementary Strands of Two Homologous DNA Molecules - General recombination involves DNA strand-exchange intermediates that require some effort to understand. Although the exact pathway followed is likely to be different in different organisms, detailed genetic analyses of viruses, bacteria, and fungi suggest that the major outcome of general recombination is always the same. Figure 3.6: A heteroduplex joint. This structure unites two DNA molecules where they have crossed over. Such a joint is often thousands of nucleotides long Figure 3.5: General recombination. The breaking and rejoining of two homologous DNA double helices creates two DNA molecules that have "crossed over.” [62] (1) Two homologous DNA molecules "cross over"; that is, their double helices break and the two broken ends join to their opposite partners to re-form two intact double helices, each composed of parts of the two initial DNA molecules (Figure 3.5). (2) The site of exchange (that is, where a red double helix is joined to a green double helix (in Figure 3.5) can occur anywhere in the homologous nucleotide sequences of the two participating DNA molecules. (3) At the site of exchange, a strand of one DNA molecule becomes base-paired to a strand of the second DNA molecule to create a staggered joint (usually called a heteroduplex joint) between the two double helices (Figure 3.6). The heteroduplex region can be thousands of base pairs long; we shall explain later how it forms. (4) No nucleotide sequences are altered at the site of exchange; the cleavage and rejoining events occur so precisely that not a single nucleotide is lost or gained. Despite this precision, general recombination creates DNA molecules of novel sequence: the heteroduplex joint can contain a small number of mismatched base pairs, and, more important, the two DNAs that cross over are usually not exactly the same Figure 3.7: One way to start a recombination on either side of the joint. event. The RecBCD protein is an enzyme The mechanism of general recombination required for general genetic recombination in E. ensures that two regions of DNA double coli. The protein enters the DNA from one end of helix undergo an exchange reaction only if the double helix and then uses energy derived they have extensive sequence homology. from the hydrolysis of bound ATP molecules to propel itself in one direction along the DNA at a The formation of a heteroduplex joint rate of about 300 nucleotides per second. A requires that such homology be present special recognition site (a DNA sequence of eight because it involves a long region of nucleotides scattered throughout the E. coli complementary base-pairing between a chromosome) is cut in the traveling loop of DNA strand from one of the two original double created by the RecBCD protein, and thereafter a single-stranded whisker is displaced from the helices and a complementary strand from helix, as shown. This whisker is thought to initiate the other. But how does this heteroduplex genetic recombination by pairing with a joint arise, and how do the two homologous helix, as in Figure 3.8. homologous regions of DNA at the site of crossing-over recognize each other? As we shall see, recognition takes place by means of a direct base-pairing interaction. The [63] formation of base pairs between complementary strands from the two DNA molecules then guides the general recombination process, allowing it to occur only between long regions of matching DNA sequence. General Recombination Can Be Initiated at a Nick in One Strand of a DNA Double Helix - Each of the two strands in a DNA molecule is helically wound around the other. As a result, extensive base-pair interactions can occur between two homologous DNA double helices only if a nick is first made in a strand of one of them, freeing that strand for the unwinding and rewinding events required to form a heteroduplex with another DNA molecule. For the same reason, any exchange of strands between two DNA double helices requires at least two nicks, one in a strand of each interacting double helix. Finally, to produce the heteroduplex joint illustrated in Figure 3.6, each of the four strands present must be cut to allow each to be joined to a different partner. In general recombination, these nicking and resealing events are coordinated so that they occur only when two DNA helices share an extensive region of matching DNA sequence. There is evidence from a number of sources that a single nick in only one strand of a DNA molecule is sufficient to initiate general recombination. Chemical agents or types of irradiation that introduce single strand nicks, for example, will trigger a genetic Figure 3.8: The initial strand exchange in general recombination. A nick in a single DNA strand frees the strand, which then invades a homologous DNA double helix to form a short pairing region with one of the strands in the second helix. Only two DNA molecules that are complementary in nucleotide sequence can base-pair in this way and thereby initiate a general recombination event. All of the steps shown here can be catalyzed by known enzymes (see Figures 3.7 and 3.11). recombination event. Moreover, one of the special proteins required for general recombination in E. coli the RecBCD protein has been shown to make single strand nicks in DNA molecules. The RecBCD protein is also a DNA helicase, hydrolyzing ATP and traveling along a DNA helix transiently exposing its strands. By combining its nuclease and helicase activities, the RecBCD protein will create a single-stranded "whisker" on the DNA double helix (Figure 3.7). Figure 3.8 shows how such a whisker could initiate a base-pairing interaction between two complementary stretches of DNA double helix. [64] Figure 3.9: DNA hybridization. DNA double helices re-form from their separated strands in a reaction that depends on the random collision of two complementary strands. Most such collisions are not productive, as shown at the left, but a few result in a short region where complementary base pairs have formed (helix nucleation). A rapid zippering then leads to the formation of a complete double helix. A DNA strand can use this trial-and-error process to find its complementary partner in the midst of millions of non-matching DNA strands. Trial-anderror recognition of a complementary partner DNA sequence appears to initiate all general recombination events. DNA Hybridization Reactions Provide a Simple Model for the Basepairing Step in General Recombination - In its simplest form, the type of basepairing interaction central to general recombination can be mimicked in a test tube by allowing a DNA double helix to re-form from its separated single strands. This process, called DNA renaturation or hybridization, occurs when a rare random collision juxtaposes complementary nucleotide sequences on two matching DNA single strands, allowing the formation of a short stretch of double helix between them. This relatively slow helix nucleation step is followed by a very rapid "zippering" step as the region of double helix is extended to maximize the number of base-pairing interactions (Figure 3.9). Formation of a new double helix in this way requires that the annealing strands be in an open, unfolded conformation. For this reason in vitro hybridization reactions are carried out at high temperature or in the presence of an organic solvent such as formamide; these conditions "melt out" the short hairpin helices formed where basepairing interactions occur within a single strand that folds back on itself. Bacterial cells could not survive such harsh conditions and instead use a single-strand binding protein, the SSB protein, to open their helices. This protein is essential for DNA replication as well as for general recombination in E. coli; it binds tightly and cooperatively to the sugar-phosphate backbone of all single-stranded regions of DNA, holding them in an extended conformation with their bases exposed. In this extended conformation a DNA single strand can base-pair efficiently with either a nucleoside triphosphate molecule (in DNA replication) or a complementary section of another DNA single strand (in genetic recombination). When hybridization reactions are carried out in vitro under conditions [65] that mimic those inside a cell, the SSB protein speeds up the rate of DNA helix nucleation and thereby the overall rate of strand annealing by a factor of more than 1000. 3.2.3 RecA & RecBCD RecA is a 38 kilodalton Escherichia coli protein essential for the repair and maintenance of DNA. RecA has a structural and functional homolog in every species in which it has been seriously sought and serves as an archetype for this class of homologous DNA repair proteins. The homologous protein in Homo sapiens is called RAD51. RecA has multiple activities, all related to DNA repair. In the bacterial SOS response, it has a co-protease function in the autocatalytic cleavage of the LexA Figure 3.10: The structure of the RecA protein. A string of three RecA monomers is shown, with the repressor and the λ repressor. Its most position of each ATP in red. The white spheres show studied role is in facilitating DNA the putative position of the single-strand DNA in the recombination for the repair of double filament, with three nucleotides (each shown as a strand DNA breaks and the exchange of sphere) bound per monomer. (From R.M. Story, I.T. genetic information through sexual Weber, and T.A. Steitz, Nature 256:318-325, 1992. © 1992 Macmillan Magazines Ltd.) reproduction. E. coli RecA protein also has a major role in homologous recombination. RecA protein binds strongly and in long clusters to ssDNA to form a nucleoprotein filament. It has more than one DNA binding site thus RecA can hold a single strand and double strand together. This feature makes it possible to catalyze a DNA synapsis reaction between a DNA double helix and a homologous region of single stranded DNA. The reaction initiates the exchange of strands between two recombining DNA double helices. After the synapis event, in the heteroduplex region a process called branch migration begins. In branch migration an unpaired region of one of the single strands displaces a paired region of the other single strand, moving the branch point without changing the total number of base pairs. Spontaneous branch migration can occur, however as it generally proceeds equally in both directions it is unlikely to complete recombination efficiently. The RecA protein catalyzes unidirectional branch migration and by doing so makes it possible to complete recombination, producing a region of heteroduplex DNA that is thousands of base pairs long. RecA protein is a DNA-dependent ATPase, it contains an additional site for binding and hydrolyzing ATP. RecA associates more tightly with DNA when it has ATP bound than when it has ADP bound. [66] Figure 3.11: DNA synapsis catalyzed by the RecA protein. In vitro experiments show that several types of complexes are formed between a DNA single strand covered with RecA protein (red) and a DNA double helix (green). First a non-base-paired complex is formed, which is converted to a three-stranded structure as soon as a region of homologous sequence is found. This complex is presumably unstable because it involves an unusual form of DNA, and it spins out a DNA heteroduplex (one strand green and the other strand red) plus a displaced single strand from the original helix (green); thus the structure shown in this diagram migrates to the left, reeling in the "input DNAs" while producing the "output DNAs." The net result is a DNA strand exchange identical to that diagrammed earlier in Figure 3.8. (Adapted from S.C. West, Annu. Rev. Biochem. 61:603-640, 1992. © Annual Reviews Inc.) Escherichia coli strains deficient in RecA are useful for cloning procedures in molecular biology laboratories. E. coli strains are often genetically modified to contain a mutant recA locus to ensure the stability of exogenous plasmids: modular circular dsDNA which bacteria replicate with their genome during normal cell growth. Plasmid DNA is taken up by the bacteria under a variety of conditions. Bacteria containing exogenous plasmids are called "transformants". Transformants retain the plasmid throughout cell divisions. such that it can be recovered and used in other applications. Without functional RecA protein the exogenous plasmid DNA is left unaltered by the bacteria. Purification of this plasmid from bacterial cultures then results in high-fidelity amplification of the original plasmid sequence. The RecA Protein Enables a DNA Single Strand to Pair with a Homologous Region of DNA Double Helix in E. coli 42 General recombination is more complex than the simple hybridization reactions just described. In the course of general recombination, a single DNA strand from one DNA double helix must invade another double helix (see Figure 3.8). In E. coli this requires the RecA protein, produced by the recA gene, which was identified in 1965 as being essential for recombination between chromosomes. Long sought by biochemists, this elusive gene product was finally purified tohomogeneity in 1976, a feat that allowed its detailed characterization (Figure 3.10). Like a singlestrand binding (SSB) protein, the RecA protein binds tightly and in large cooperative clusters to single-stranded DNA to form a nucleoprotein filament. This filament has several distinctive properties. The RecA protein has more than one DNA-binding site, for example, and it can therefore hold a single strand and a double helix together. These sites allow the RecA protein to catalyze a multistep reaction (called synapsis) between a DNA double helix and a homologous [67] region of single-stranded DNA. The crucial step in synapsis occurs when a region of homology is identified by an initial base-pairing between complementary nucleotide sequences. The nucleation step in this case appears to involve a three-stranded structure, in which the DNA single strand forms nonconventional base pairs in the major groove of the DNA double helix (Figure 3.11). This begins the pairing shown previously in Figure 3.8 and so initiates the exchange of strands between two recombining DNA double helices. Studies in vitro suggest that the E. coli SSB protein cooperates with the RecA protein to facilitate these reactions. Once synapsis has occurred, a short heteroduplex region where the strands from two different DNA molecules have begun to pair is enlarged through protein-directed branch migration, which can also be catalyzed by the RecA protein. Branch migration can take place at any point where two single DNA strands with the same sequence are attempting to pair with the same complementary strand; an unpaired region of one of the single strands will displace a paired region of the other single strand, moving the branch point without changing the total number of DNA base pairs. Spontaneous branch migration proceeds equally in both directions, and so it makes little progress and is unlikely to complete recombination efficiently (Figure 3.12A). Because the RecA protein catalyzes unidirectional branch migration, it readily produces a region of Figure 3.12: Two types of DNA branch migration observed in heteroduplex that is experiments in vitro. (A) Spontaneous branch migration is a backthousands of base pairs and-forth, random-walk type of process, and it therefore makes little progress over long distances. (B) RecA-protein-directed branch long (Figure 3.12B). The catalysis of branch migration proceeds at a uniform rate in one direction, and it may be by the polarized assembly of the RecA protein filament on a migration depends on a driven DNA single strand, which occurs in the direction indicated. In further property of the addition, special DNA helicases that catalyze protein-directed branch RecA protein. In migration even more efficiently are involved in recombination addition to having two [68] DNA-binding sites, the RecA protein is a DNA-dependent ATPase, with an additional site for binding and hydrolyzing ATP. The protein associates much more tightly with DNA when it has ATP bound than when it has ADP bound. Moreover, new RecA molecules with ATP bound are preferentially added at one end of the RecA protein filament, and the ATP is then hydrolyzed to ADP. The RecA protein filaments that form on DNA may therefore share many of the dynamic assembly properties displayed by the cytoskeletal filaments formed from actin or tubulin; an ability of the protein to "treadmill" unidirectionally along a DNA strand, for example, could drive the branch migration reaction shown in Figure 3.12B. RecBCD RecBCD, also known as ‘Exonuclease V’, is a protein of the E. coli bacterium that initiates recombinational repair from DNA double strand breaks which are a common result of ionizing radiation, replication errors, endonucleases, oxidative damage and a host of other factors. It is both, a helicase that unwinds, or separates the strands of DNA and a nuclease that makes single-stranded nicks in DNA. RecBCD (Figure 3.13) is composed of three different subunits, encoded by the recB, recC, and recD genes. Both the RecB and RecD subunits are helicases, i.e. energy-dependent molecular motors that unwind Figure 3.13: RecBCD Crystal Structure DNA or RNA. RecBCD is unusual amongst helicases in that it recognizes a specific sequence in DNA, 5'-GCTGGTGG-3' that is known as ‘Chi’. After it initiates unwinding, RecBCD makes nicks on the strand that contains the unwound 3' end. When RecBCD encounters a Chi site on this strand as it is unwinding DNA, it makes a final nick and pauses. It has been proposed that this pause is a consequence of a conformational rearrangement in the protein that changes its properties. When RecBCD resumes unwinding, it now nicks the opposite strand (i.e. that containing the 5' unwound end). As a consequence, the 3' strand remains intact downstream of Chi. This is important because the strand exchange protein, RecA, that is responsible for the next step of recombinational repair needs a single-strand molecule with a 3' end. RecBCD is also a model enzyme for the use of single molecule fluorescence as an experimental technique used to better understand the function of protein-DNA interactions. General Genetic Recombination Usually Involves a Cross-Strand Exchange - Exchanging a single strand between two double helices is presumed to be the slow and difficult step in a general recombination event (see Figure 3.8). After this initial exchange, extending the region of pairing and establishing further strand exchanges between the two closely apposed helices is thought to proceed rapidly. During these events a limited amount of nucleotide excision and local DNA resynthesis often occurs, resembling some of the events in DNA repair. [69] Because of the large number of possibilities, different organisms are likely to follow different pathways at this stage. In most cases, however, an important intermediate structure, the crossstrand exchange, will be formed by the two participating DNA helices. One of the simplest ways in which this structure can form is shown in Figure 3.14. In the cross-strand exchange (also called a Holliday junction) the two homologous DNA helices that initially paired are held together by mutual exchange of two of the four strands present, one originating from each of the helices. No disruption of basepairing is necessary to maintain this structure, which has two important properties (1) the point of exchange between the two homologous DNA double helices (where the two strands cross in Figure 3.14) can migrate rapidly back and forth along the helices by a double branch migration; (2) the cross-strand exchange contains two pairs of strands: one pair of crossing strands and one pair of non-crossing strands. The structure can isomerize, however, by undergoing a series of rotational movements, so that the two original non-crossing strands become crossing strands and vice versa (Figure 3.15). In order to regenerate two separate DNA helices and thus terminate the pairing process, the two crossing strands must be cut. If the crossing strands are cut before isomerization, the two original DNA helices separate from each other nearly unaltered, with only a very short piece of single stranded DNA exchanged. If the crossing strands are cut after isomerization, however, one section of each original DNA helix is joined to a section of the other DNA helix; in other words, the two DNA helices have crossed over (see Figure 3.15). The isomerization of the cross-strand exchange should occur spontaneously at some rate, but it may also be enzymatically driven or otherwise regulated by cells. Some kind of control probably operates during meiosis, when the two DNA double helices that pair are constrained in an elaborate structure called the synaptonemal complex. Gene Conversion Results from Combining General Recombination and Limited DNA Synthesis - It is a fundamental law of genetics that each parent makes an equal genetic contribution to the offspring, one complete set of genes being inherited from the father and one from the mother. Thus, when a diploid cell undergoes meiosis to produce four haploid cells, exactly half of the genes in these cells should be maternal (genes that the diploid cell inherited from its mother) and the other half paternal (genes that the diploid cell inherited from its father). Figure 3.14: The formation of a cross-strand exchange. There are many possible pathways that can lead from a single-strand exchange to a cross-strand exchange, but only one is shown. [70] Figure 3.15: The isomerization of a cross-strand exchange. Without isomerization, cutting the two crossing strands would terminate the exchange and crossing over would not occur. With isomerization (steps B and C), cutting the two crossing strands creates two DNA molecules that have crossed over (bottom). Isomerization is therefore thought to be required for the breaking and rejoining of two homologous DNA double helices that result from general genetic recombination. Step A was illustrated previously (see Figure 3.14) Figure 3.16: One general recombination pathway that can cause gene conversion. The process begins when a nick is formed in one of the strands in the red DNA helix. In step 1 DNA polymerase begins the synthesis of an extra copy of a strand in the red helix, displacing the original copy as a single strand. This single strand then pairs with the homologous region of the green helix. In step 2 the short region of unpaired green strand produced in step 1 is degraded, completing the transfer of nucleotide sequences. The result is normally seen in the next cell cycle, after DNA replication has separated the two nonmatching strands (step 3). As described in the text, the repair of mismatched base pairs in a heteroduplex joint also causes gene conversion. [71] In a complex animal, such as a human, it is not possible to check this prediction directly. But in other organisms, such as fungi, where it is possible to recover and analyze all four of the daughter cells produced from a single cell by meiosis, one finds cases in which the standard genetic rules have apparently been violated. Occasionally, for example, meiosis yields three copies of the maternal version of a gene (allele) and only one copy of the paternal allele, indicating that one of the two copies of the paternal allele has been changed to a copy of the maternal allele. This phenomenon is known as gene conversion. It often occurs in association with general genetic recombination events, and it is thought to be important in the evolution of certain genes. Gene conversion is believed to be a straightforward consequence of the mechanisms of general recombination and DNA repair. During meiosis heteroduplex joints are formed at the sites of crossing-over between homologous maternal and paternal chromosomes. If the maternal and paternal DNA sequences are slightly different, the heteroduplex joint may include some mismatched base pairs. The resulting mismatch in the double helix may then be corrected by the DNA repair machinery, which either can erase nucleotides on the paternal strand and replace them with nucleotides that match the maternal strand or vice versa. The consequence of this mismatch repair will be a gene conversion. Gene conversion can also take place by a number of other mechanisms, but they all require some type of general recombination event that brings two copies of a closely related DNA sequence together. Because an extra copy of one of the two DNA sequences is generated, a limited amount of DNA synthesis must also be involved. Genetic studies show that usually only small sections of DNA undergo gene conversion, and in many cases only part of a gene is changed. Gene conversion can also occur in mitotic cells, but it does so more rarely. As in meiotic cells, some gene conversions in mitotic cells probably result from a mismatch repair process operating on heteroduplex DNA. Another likely mechanism in both meiotic and mitotic cells is illustrated in Figure 3.16. Mismatch Proofreading Can Prevent Promiscuous Genetic Recombination Between Two Poorly Matched DNA Sequences - As previously discussed, general recombination is triggered whenever two DNA strands of complementary sequence pair to form a heteroduplex joint between two double helices (see Figure 3.14). Experiments carried out in vitrowith purified RecA protein show that pairing can occur efficiently even when the sequences of the two DNA strands do not match well - when, for example, only four out of every five nucleotides on average can form base pairs. How, then, do vertebrate cells avoid promiscuous general recombination between the many thousands of copies of closely related DNA sequences that are repeated in their genomes? Although the answer is not known, studies with bacteria and yeasts demonstrate that the same mismatch proofreading system that removes replication errors has the additional role of interrupting genetic recombination events between imperfectly matched DNA sequences. It has long been known, for example, that homologous genes in two closely related bacteria, Escherichia coli and Salmonella typhimurium, generally will not recombine, even though their nucleotide sequences are 80% identical; when the mismatch proofreading system is inactivated by mutation, however, there is a 1000-fold increase in the frequency of such interspecies recombination events. It is thought, then, that the mismatch proofreading system normally recognizes the mispaired bases in an [72] Figure 3.17: Proofreading prevents general recombination from destabilizing genomes that contain repeated sequences. Studies with bacterial and yeast cells suggest that the mismatch proofreading system has the additional function shown here. initial strand exchange and prevents the subsequent steps required to break and rejoin the two paired DNA helices. This mechanism protects the bacterial genome from the sequence changes thatwould otherwise be caused by recombination with foreign DNA molecules that occasionally enter the cell. In vertebrate cells, which contain many closely related DNA sequences, the same type of proofreading is thought to help prevent promiscuous recombin-ation events that would otherwise scramble the genome (Figure 3.17) 3.2.4 Site Specific Recombination Site-specific Recombination Enzymes Move Special DNA Sequences into and out of Genomes - Site-specific genetic recombination, unlike general recombination, is guided by a recombination enzyme that recognizes specific nucleotide sequences present on one or both of the recombining DNA molecules. Base-pairing between the recombining DNA molecules need not be involved, and even when it is, the heteroduplex joint that is formed is only a few base pairs long. By separating and joining double-stranded DNA molecules at specific sites, this type of recombination enables various types of mobile DNA sequences to move about within and between chromosomes. Site-specific recombination was first discovered as the means by which a bacterial virus, bacteriophage lambda, moves its genome into and out of the E. coli chromosome. In its integrated state the virus is hidden in the bacterial chromosome and replicated as part of the host's DNA. When the virus enters a cell, a virus-encoded enzyme called lambda integrase is synthesized. This enzyme catalyzes a recombination process that begins when several molecules of the integrase protein bind tightly to a specific DNA sequence on the [73] circular bacteriophage chromosome. The resulting DNA-protein complex can now bind to a related but different specific DNA sequence on the bacterial chromosome, bringing the bacterial and bacteriophage chromosomes close together. The integrase then catalyzes the required DNA cutting and resealing reactions, using a short region of sequence homology to form a tiny heteroduplex joint at the point of union (Figure 3.18). The integrase resembles a DNA topoisomerase in that it forms a reversible covalent linkage to DNA wherever it breaks a DNA chain. The same type of site-specific recombination mechanism can also be carried out in reverse by the lambda bacteriophage, enabling it to exit from its integration site in the E. coli chromosome in order to multiply rapidly within the bacterial cell. This excision reaction is catalyzed by a complex of the integrase enzyme with a second bacteriophage protein, which is produced by the virus only when its host cell is stressed. If the sites recognized by such a recombination enzyme are flipped, the DNA between them will be Figure 3.18: The insertion of bacteriophage inverted rather than excised. lambda DNA into the bacterial chromosome. In Many other enzymes that catalyze site- this example of site-specific recombination, the specific recombination resemble lambda lambda integrase enzyme binds to a specific "attachment site" DNA sequence on each integrase in requiring a short region of chromosome, where it makes cuts that bracket identical DNA sequence on the two a short homologous DNA sequence; the regions of DNA helix to be joined. integrase thereby switches the partner strands Because of this requirement, each and reseals them so as to form a heteroduplex enzyme in this class is fastidious with joint 7 base pairs long. Each of the four strandbreaking and strand joining reactions required respect to the DNA sequences that it resembles that made by a DNA recombines, and it can be expected to topoisomerase, inasmuch as the energy of a catalyze one particular DNA joining cleaved phosphodiester bond is stored in a event that is useful to the virus, plasmid, transient covalent linkage between the DNA transposable element, or cell that and the enzyme (see Figure 3.14). contains it. These enzymes can be exploited as tools in transgenic animals to study the influence of specific genes on cell behavior, as illustrated in Figure 3.19. Site-specific recombination enzymes that break and rejoin two DNA double helices at specific sequences on each DNA molecule often [74] Figure 3.20: Transpositional site-specific recombination. (A) Outline of the strand-breaking and - rejoining events that lead to integration of the linear double-stranded DNA of a retrovirus (red) into an animal cell chromosome (blue). In an initial endonuclease step the integrase enzyme makes a cut in one strand at each end of the viral DNA sequence, exposing a protruding 3'-OH group. Each of these 3'-OH ends then directly attacks a phosphodiester bond on opposite strands of a randomly selected site on a target chromosome. This inserts the viral DNA sequence into the target chromosome, leaving short gaps on each side that are filled in by DNA repair processes. Because of the gap filling, this type of mechanism leaves short repeats of target DNA sequence [3 to 12 nucleotides in length (black), depending on the integrase enzyme] on either side of the integrated DNA segment. (B) An atomic-level view of the attack by one DNA chain end in (A) on a phosphodiester bond of the target DNA (blue). This mechanism resembles that used in RNA splicing, and is distinctly different from the topoisomerase-like activity of lambda integrase. (Adapted from K. Mizuuchi, J. Biol. Chem. 267:21273-21276, 1992.) do so in a reversible way: as for lambda bacteriophage, the same enzyme system that joins two DNA molecules can take them apart again, precisely restoring the sequences of the two original DNA molecules. This type of recombination is therefore called conservative site-specific recombination to distinguish it from the mechanistically distinct transpositional site-specific recombination that we discuss next. Transpositional Recombination Can Insert a Mobile Genetic Element into Any DNA Sequence - Many mobile DNA sequences, including many viruses and [75] transposable elements, encode integrases that insert their DNA into a chromosome by a mechanism that is different from that used by bacteriophage lambda. Like the lambda integrase, each of these enzymes recognizes a specific DNA sequence in the particular mobile genetic element whose recombination it catalyzes. Unlike the lambda enzyme, however, these integrases do not require a specific DNA sequence in the "target" chromosome and they do not form a heteroduplex joint. Instead, they introduce cuts into both ends of the linear DNA sequence of the mobile genetic element and then catalyze a direct attack by these DNA ends on the target DNA molecule, breaking two closely spaced phosphodiester bonds in the target molecule. Because of the way that these breaks are made, two short single-stranded gaps are left in the recombinant DNA molecule, one at each end of the mobile element; these are filled in by DNA polymerase to complete the recombination process. As illustrated in Figure 3.20, this mechanism creates a short duplication of the adjacent target DNA sequence; such flanking duplications are the hallmark of a transpositional site-specific recombination event. An integrase enzyme of this type was first purified in active form from bacteriophage Mu. Like the bacteriophage lambda integrase, it carries out all of its cutting and rejoining reactions without requiring an energy source (such as ATP). Very similar enzymes are present in organisms as diverse as bacteria, fruit flies, and humans - all of which contain mobile genetic elements, as we discuss next. 3.2.5 Chromosome Mapping Linkage Groups And Genetic Markers Definitions Used in Genetic Mapping - Why do geneticists indulge in mapping? The answer to that question depends largely on the sorts of mapping that are employed because each level or type of mapping can answer certain questions. Because of this, we will treat the rationale behind mapping in two ways, one for gross Mapping and one for "fine structure" mapping. In either case, DNA is transferred into a recipient cell under conditions where there is a selection for the stable inheritance of the incoming DNA. Typically this involves a selection for recombination of the incoming DNA with a replicon in the recipient. One performs gross mapping if one's intentions are to either place the marker of interest somewhere on a chromosomal map, or to find out any other relevant or irrelevant markers that happen to be genetically linked. This sort of mapping is often reported in the literature but, in general, it does not really tell you very much. Arguably, it just sets up the system for future strain constructions, allows preliminary genetic analysis of other mutations, helps in the construction of either R-primes or F-primes for complementation analysis, and allows some sort of comparison to genetically similar systems. For example, if you knew you had three loci (with a particular phenotype) and showed that they were each linked to a different selectable marker and unlinked to each other, then you have answered nearly all the interesting questions that can be addressed with this level of genetic analysis. It is now becoming possible to do gross mapping physically. This has required the identification of restriction enzymes that cut very rarely (<20 times per genome) and the development of an electrophoresis system, orthogonal field electrophoresis, capable of [76] resolving very large DNA fragments. The localization of a gene to a given fragment, using physical or genetic methods, provides gross, physical mapping information. The goal of fine structure mapping is to order mutations, which are known to map in a given small region, into a one-dimensional array. This array actually says little about physical distance between the mutations, but a comparison of the order of mutations with the phenotypes that they cause allows strong statements to be made about the organization of the genetic system. Physical mapping can also order mutations and provide that ordering with actual physical distances. Properly, this array should be ordered with respect to other external markers. This ordering will allow you to make sense of your complementation data (you can then tell polarity from allelism); it allows the "clustering" of phenotypes that, in conjunction with complementation, helps define genes and gene functions; when performed in conjunction with "reversion analysis", it helps confirm that the mutation you are dealing with is a single and not a double mutation. Increasingly, the fine structure analysis of DNA is the only form of mapping of interest to molecular biologists, and deletion mapping is the best way to genetically perform such mapping. As sequencing methodology has become ever more rapid, it is becoming reasonable to map by sequencing, thus providing a physical reality to the mutation order. On the other hand, while mapping itself is becoming less relevant, the concept of linkage remains important and will be the focus of this section of the text. It is important to understand the difference between mapping and complementation. Complementation is a test of function. It asks the question if two separate regions of DNA can supply all required functions for an apparently wild-type phenotype. Mapping is a test of sequence. It asks if, and with what frequency, two non-identical versions of the same genetic region are capable of recombining to generate a wild-type sequence. Complementation is therefore best analyzed in the absence of recombination while mapping typically demands recombinational events. Your mapping analyses have to be so devised that you can select for a phenotype that requires one or more recombinational events. A term that is used with great frequency in discussions of mapping is linkage. Linkage is defined as the frequency with which two sites (a site can either be the site of a mutation or the site of the wild-type version of the mutation) on a piece of DNA are co-inherited using a particular gene transfer system. As such, it is a function of two variables: (1) The frequency with which the two sites are brought into the same cell by that particular gene transfer system (termed "end effects" in some of the following sections, in reference to the "ends" of the transferred DNA) and (2) the frequency with which they are both recombined into the chromosome. Another statement of the latter point is that, for linkage to be observed, the recombination events occur "outside" each site and not between them. Ignoring end effects, linkage is inversely proportional to the likelihood of a recombinational event occurring between two sites and (since recombination events are random and their likelihood increases with the increasing size of homologous regions available for recombination) therefore, to the distance between the sites: The product strain (the genotypically altered recipient) of a recombinational event is often referred to as a recombinant. Genetic mapping also makes the assumption that there is only one piece of DNA exchanged between the two organisms. Thus it is assumed that if two markers enter a recipient cell, they must be on the same piece of DNA and they therefore must be [77] "linked" in that gene transfer system. If one utilizes a gene transfer system where more than one distinct piece of DNA can enter the same cell, one of the assumptions used in mapping is violated and problems in interpretation can occur since the apparent linkage would reflect the frequency of the two markers entering the same cell separately and not the genetic distance between them. This latter case can occur in either transformation or in generalized transduction with the highly efficient transducing phage P22HT, since these two systems are so efficient at moving DNA into a recipient that it is quite possible to get more than one piece of DNA into a given recipient. Such a phenomenon is known as congression. 3.2.6 Corelation Between Genetic Mapping And Physical Mapping Mapping By "Prime Complementation" - If one had in hand a set of primes that carried the entire bacterial chromosome, one could mate them, one at a time, into a recipient with a particular mutation and look for the correction of that mutation by complementation or recombination. Presumably, only the primes carrying the region mutated in the recipient would be capable of correcting the mutant phenotype. If one thinks about this a bit, it is clear that this form of mapping is a version of deletion mapping where the majority of the chromosome is deleted. It can also be performed with smaller cloned regions on any replicating plasmid. This system works because most mutations cause loss of function and are therefore recessive to wild-type. This approach would fail in an attempt to map trans-dominant mutations. Physical Mapping - As noted at the start, it is becoming possible to cut an entire bacterial chromosome into a relatively few pieces (typically with restriction enzymes with unusually large, and therefore very infrequent, target sites) and then to identify the fragment that hybridizes to any cloned piece of DNA. Since the "marker" used is a hybridization probe, this allows mapping of regions of hybridization and rather than mutations, in contrast to genetic mapping. The physical mapping of a transposon insertion does both, however, because the hybridizing region is the mutation. When this approach has been performed on organisms with a preexisting body of genetic information available (e.g. E. coli), a very powerful genetic/physical composite map is generated. On the other hand, it is unclear to this observer, at least, how such information is of particular use in understanding organisms that lack such a history of genetic characterization, since it simply locates the cloned region on a vast featureless piece of DNA. This approach will certainly become easier as more "rarely-cutting" restriction enzymes become available and as tools are developed to introduce unique restriction sites into genomes. Final Notes on Mapping - The problem of "signal-to-noise ratio", alluded to in the section on deletion mapping, is an important point. It should be remembered that most point mutations will revert at a reasonable frequency and for many mapping systems, these revertants will confuse the results and lower the potential resolution of the mapping system. [78] It should be reemphasized that genetic mapping, and particularly deletion mapping, establishes genetic, and not physical, distances (though rough estimates are possible through use of certain numerical analyses). Just because one finds more mutations in a particular region of a gene, as defined by deletion mapping, one should not assume that region is large. It could simply be that region of the protein is critical, so that a disproportionate fraction of point mutations are detected there. Consequently, this allows you to separate the region into more deletion intervals because the end points of more deletions are separable. 3.3 MUTATIONS In the living cell, DNA undergoes frequent chemical change, especially when it is being replicated (in S phase of the eukaryotic cell cycle). Most of these changes are quickly repaired. Those that are not result in a mutation. Thus, mutation is a failure of DNA repair. 3.3.1 Spontaneous & Induced Mutations Spontaneous Replication Errors - Replication is amazingly accurate: fewer than one in a billion errors are made in the course of DNA synthesis. However, spontaneous replication errors do occasionally occur. The primary cause of spontaneous replication errors was formerly thought to be tautomeric shifts, in which the positions of protons in the DNA bases change. Purine and pyrimidine bases exist in different chemical forms called tautomers (Figure 3.21). The two tautomeric forms of each base are in dynamic equilibrium, although one form is more common than the other. The standard Watson and Crick base pairings—adenine with thymine, and cytosine with guanine—are between the common forms of the bases, but, if the bases are in their rare tautomeric forms, other base pairings are possible (Figure 3.22). Watson and Crick proposed that tautomeric shifts might produce mutations, and for many years their proposal was the accepted model for spontaneous replication errors, but there has never been convincing evidence that the rare tautomers are the cause of spontaneous mutations. Furthermore, research now shows little evidence of these structures in DNA. Mispairing can also occur through wobble, in which normal, protonated, and other forms of the bases are able to pair because of flexibility in the DNA helical structure (Figure 3.22). These structures have been detected in DNA molecules and are now thought to be responsible for many of the mispairings in replication. When a mismatched base has been incorporated into a newly synthesized nucleotide chain, an incorporated error is said to have occurred. Suppose that, in replication, thymine (which normally pairs with adenine) mispairs with guanine through wobble (Figure 3.23). In the next round of replication, the two mismatched bases separate, and each serves as template for the synthesis of a new nucleotide strand. This time, thymine pairs with adenine, producing another copy of the original DNA sequence. On the other strand, however, the incorrectly incorporated guanine serves as the template and pairs with cytosine, producing a new DNA molecule that has an incorporated error a C_G pair in place of the original T_A pair (a T_A:C_G base substitution). The original incorporated error leads to a replication error, which creates a permanent mutation, [79] because all the base pairings are correct and there is no mechanism for repair systems to detect the error. Mutations due to small insertions and deletions also may arise spontaneously in replication and crossing over. Strand slippage may occur when one nucleotide strand forms a small loop (Figure 3.24). Figure 3.21: Purine and pyrimidine bases exist in different forms called tautomers. (a) A tautomeric shift occurs when a proton changes its position, resulting in a rare tautomeric form. (b) Standard and anomalous base-pairing arrangements occur if bases are in the rare tautomeric forms. Base mispairings due to tautomeric shifts were originally thought to be a major source of errors in replication, but such structures have not been detected in DNA, and most evidence now suggests that other types of anomalous pairings are responsible for replication errors. Figure 3.22: Nonstandard base pairings can occur as a result of the flexibility in DNA structure. Thymine and guanine can pair through wobble between normal bases. Cytosine and adenine can pair through wobble when adenine is protonated (has an extra hydrogen). [80] Figure 3.23: Wobble base pairing leads to a replicated error. If the looped-out nucleotides are on the newly synthesized strand, an insertion results. At the next round of replication, the insertion will be incorporated into both strands of the DNA molecule. If the looped-out nucleotides are on the template strand, then there is a deletion on the newly replicated strand, and this deletion will be perpetuated in subsequent rounds of replication. During normal crossing over, the homologous sequences of the two DNA molecules align, and crossing over produces no net change in the number of nucleotides in either molecule. Misaligned pairing may cause unequal crossing over, which results in one DNA molecule with an insertion and the other with a deletion (Figure 3.25). Some DNA sequences are more likely than others to undergo strand slippage or un-equal crossing over. Stretches of repeated sequences, such as trinucleotide repeats or homopolymeric repeats (more than five repeats of the same base in a row), are prone to strand slippage. Stretches with more repeats are more likely to undergo strand slippage. Duplicated or repetitive sequences may misalign during pairing, leading to unequal crossing over. Both strand slippage and unequal crossing over produce duplicated copies of sequences, which in turn promote further strand slippage and unequal crossing over. This chain of events may explain the phenomenon of anticipation often observed for expanding trinucleotide repeats. Spontaneous Chemical Changes - In addition to spontaneous mutations that arise in replication, mutations also result from Figure 3.24: Insertions and deletions may result from strand slippage. spontaneous chemical changes in DNA. One such change is depurination, the loss of a Figure 3.25: Unequal crossing over produces insertions and deletions. [81] purine base from a nucleotide. Depurination results when the covalent bond connecting Figure 3.26: Depurination, loss of a purine base from the nucleotide, produces an apurinic site. Figure 3.27 : Deamination alters DNA bases. the purine to the 1_-carbon atom of the deoxyribose sugar breaks (Figure 3.26), producing an apurinic site - a nucleotide that lacks its purine base. An apurinic site cannot act as a template for a complementary base in replication. In the absence of base-pairing constraints, an incorrect nucleotide (most often adenine) is incorporated into the newly synthesized DNA strand opposite the apurinic site (Figure 3.26b), frequently leading to an incorporated error. The incorporated error is then transformed into a replication error at the next round of replication. Depurination is a common cause of spontaneous mutation; a mammalian cell in culture loses approximately 10,000 purines every day. Another spontaneously occurring chemical change that takes place in DNA is deamination, the loss of an amino group (NH2) from a base. Deamination may occur spontaneously or be induced by mutagenic chemicals. A brief history of Herman Muller - The first discovery of a chemical mutagen was made by Charlotte Auerbach, who was born in Germany to a Jewish family in 1899. After attending university in Berlin and doing research, she spent several years teaching at various schools in Berlin. Faced with increasing anti-Semitism in Nazi Germany, Auerbach immigrated to Britain, where she conducted research on the development of mutants in Drosophila. There she met Herman Muller, who had shown that radiation [82] Figure 3.28: 5-Bromouracil (a base analog) resembles thymine, except that it has a bromine atom in place of a methyl group on the 5-carbon atom. Because of the similarity in their structures, 5-bromouracil may be incorporated into DNA in place of thymine. Like thymine, 5-bromouracil normally pairs with adenine but, when ionized, it may pair with guanine through wobble. induces mutations; he suggested that Auerbach try to obtain mutants by treating Drosophila with chemicals. Her initial attempts met with little success. Other scientists were conducting top-secret research on mustard gas (used as a chemical weapon in World War I) and noticed that it produced many of the same effects as radiation. Auerbach was asked to determine whether mustard gas was mutagenic. Collaborating with pharmacologist J. M. Robson, Auerbach studied the effects of mustard gas on Drosophila melanogaster. The experimental conditions were crude. They heated liquid mustard gas over a Bunsen burner on the roof of the pharmacology Figure 3.29: Chemicals may alter DNA bases. (a) The alkylating agent ethylmethanesulfonate (EMS) adds an ethyl group to guanine, producing 6-ethylguanine, which pairs with thymine, producing a C_G:T_A transition mutation. (b) Nitrous acid deaminates cyto-sine to produce uracil, which pairs with adenine, producing a C_G:T_A transition mutation. (c) Hydroxylamine converts cytosine into hydroxylaminocytosine, which frequently pairs with adenine, leading to a C_G:T_A transition mutation. [83] building, and the flies were exposed to the gas in a large chamber. After developing serious burns on her hands from the gas, Auerbach let others carry out the exposures, and she analyzed the flies. Auerbach and Robson showed that mustard gas is indeed a powerful mutagen, reducing the viability of gametes and increasing the numbers of mutations seen in the offspring of exposed flies. Because the research was part of the secret war effort, publication of their findings was delayed until 1947. Chemically Induced Mutations - Although many mutations arise spontaneously, a number of environmental agents are capable of damaging DNA, including certain chemicals and radiation. Any environmental agent that significantly increases the rate of mutation above the spontaneous rate is called a mutagen. (1) Base analogs - one class of chemical mutagens consists of base analogs, chemicals with structures similar to that of any of the four standard bases of DNA. DNA polymerases cannot distinguish these analogs from the standard bases; so, if base analogs are present during replication, they may be incorporated into newly synthesized DNA molecules. For example, 5-bromouracil (5BU) is an analog of thymine; it has the same structure as that of thymine except that it has a bromine (Br) atom on the 5-carbon atom instead of a methyl group (Figure 3.28a). Normally, 5-bromouracil pairs with adenine just as thymine does, but it occasionally Figure 3.31: 5-Bromouracil can lead to a replicated error. mispairs with guanine (Figure 3.28b), leading to a transition (T_A:5BU_A:5BU_G:C_G), as shown in (Figure 3.31). Through mispairing, 5-bromouracil may also be incorporated into a newly synthesized DNA strand opposite guanine. In the next round of replication, 5-bromouracil may pair with adenine, leading to another transition (G_C:G_5BU:A_5BU:A_T). (2) 2-aminopurine (2AP) - Another mutagenic chemical is, which is a base analog of adenine. Normally, 2-aminopurine base pairs with thymine, but it may mispair with cytosine, causing a transition mutation (T_A:T_2AP:C_2AP:C_G). Alternatively, 2aminopurine may be incorporated through mispairing into the newly synthesized DNA [84] opposite cytosine and later pair with thymine, leading to a C_G:C_2AP:T_2AP:T_A transition. Thus, both 5-bromouracil and 2-aminopurine can produce transition mutations. In the laboratory, mutations by base analogs can be reversed by treatment with the same analog or by treatment with a different analog. (3) Alkylating agents - Alkylating agents are chemicals that donate alkyl groups. These agents include methyl (CH3) and ethyl (CH3– CH2) groups, which are added to nucleotide bases by some chemicals. For example, ethylmethanesulfonate (EMS) adds an ethyl group to guanine, producing 6-ethylguanine, which pairs with thymine (see Figure 3.29a). Thus, EMS produces C_G:T_A transitions. EMS is also capable of adding an ethyl group to thymine, producing 4-ethylthymine, which then pairs with guanine, leading to a T_A:C_G transition. Because EMS produces both Figure 3.30: Oxidative radicals convert guanine C_G:T_A and T_A:C_G transitions, into 8-oxy-7,8-dihydrodeoxyguanine, which frequently mispairs with adenine instead of mutations produced by EMS can be reversed cytosine, producing a C_G:T_A transversion. by additional treatment with EMS. Mustard gas is another alkylating agent. (4) Deamination - In addition to its spontaneous occurrence (see Figure 3.27), deamination can be induced by some chemicals. For instance, nitrous acid deaminates cytosine, creating uracil, which in the next round of replication pairs with adenine (see Figure 3.29b), producing a C_G:T_A transition mutation. Nitrous acid changes adenine into hypoxanthine, which pairs with cytosine, leading to a T_A:C_G transition. Nitrous acid also deaminates guanine, producing xanthine, which pairs with cytosine just as guanine does; however xanthine may also pair with thymine, leading to a C_G:T_A transition. Nitrous acid produces exclusively transition mutations and, because both C_G:T_A and T_A:C_G transitions are produced, these mutations can be reversed with nitrous acid. Figure 3.32: Intercalating agents such as Hydroxylamine Hydroxylamine is a very proflavin and acridine orange insert specific basemodifying mutagen that adds a themselves between adjacent bases in DNA, hydroxyl group to cytosine, converting it distorting the three-dimensional structure of the helix and causing single-nucleotide into hydroxylaminocytosine. This insertions and deletions in replication. conversion increases the frequency of a rare [85] tautomer that pairs with adenine instead of guanine and leads to C_G:T_A transitions. Because hydroxylamine acts only on cytosine, it will not generate T_A:C_G transitions; thus, hydroxylamine will not reverse the mutations that it produces. (5) Oxidative reactions - Reactive forms of oxygen (including superoxide radicals, hydrogen peroxide, and hydroxyl radicals) are produced in the course of normal aerobic metabolism, as well as by radiation, ozone, peroxides, and certain drugs. These reactive forms of oxygen damage DNA and induce mutations by bringing about chemical changes to DNA. For example, oxidation converts guanine into 8-oxy-7,8-dihydrodeoxyguanine (Figure 3.30), which frequently mispairs with adenine instead of cytosine, causing a G_C:T_A transversion mutation. (6) Intercalating agents - Intercalating agents, such as proflavin, acridine orange, ethidium bromide, and dioxin are about the same size as a nucleotide (Figure 3.32a). They produce mutations by sandwiching themselves (intercalating) between adjacent bases in DNA, distorting the three-dimensional structure of the helix and causing singlenucleotide insertions and deletions in replication (Figure 3.32b). These insertions and deletions frequently produce frameshift mutations (which change all amino acids downstream of the mutation), and so the mutagenic effects of intercalating agents are often severe. Because intercalating agents generate both additions and deletions, they can reverse the effects of their own mutations. Radiation - In 1927, Herman Muller demonstrated that mutations in fruit flies could be induced by X-rays. The results of subsequent studies showed that X-rays greatly increase mutation rates in all organisms. The high energies of X-rays, gamma rays, and cosmic rays (Figure 3.33) are all capable of penetrating tissues and damaging DNA. These forms of radiation, called ionizing radiation, dislodge electrons from the atoms that they encounter, changing stable molecules into free radicals and reactive ions, which then alter the structures of bases and break phosphodiester bonds in DNA. Ionizing radiation also frequently results in double-strand breaks in DNA. Attempts to repair these breaks can produce chromosome mutations. Ultraviolet light has less energy than that of ionizing radiation and does not eject electrons and cause ionization but is nevertheless highly mutagenic. Purine and pyrimidine bases readily absorb UV light, resulting in the formation of chemical bonds between adjacent pyrimidine molecules on the same strand of DNA and in the creation of structures called pyrimidine dimmers (Figure 3.34a). Pyrimidine dimers Figure 3.33: In the electromagnetic spectrum, as wavelength decreases, energy increases. [86] consisting of two thymine bases (called thymine dimers) are most frequent, but cytosine dimmers and thymine–cytosine dimers also can form. These dimmers distort the configuration of DNA (Figure 3.34b) and often block replication. Most pyrimidine dimers are immediately repaired by mechanisms discussed later in this chapter, but some escape repair and inhibit replication and transcription. When pyrimidine dimers block replication, cell division is inhibited and the cell usually dies; for this reason, UV light kills bacteria and is an effective sterilizing agent. For a mutation—a hereditary error in the genetic instructions - to occur, the replication block must be overcome. How do bacteria and other organisms replicate despite the presence of thymine dimers? Bacteria can circumvent replication blocks produced by pyrimidine dimers and other types of DNA damage by means of the SOS system. This system allows replication blocks to be overcome, but in the process makes numerous mistakes and greatly increases the rate of mutation. Indeed, the very reason that replication can proceed in the presence of a block is that the enzymes in the SOS system do not strictly adhere to the base-pairing rules. The trade-off is that replication may continue and the cell survives, but only by sacrificing the normal accuracy of DNA synthesis. The SOS system is complex, including the products of at least 25 genes. A protein called RecA binds to the damaged DNA at the blocked replication fork and becomes activated. This activation promotes the binding of a protein called LexA, which is a repressor of the SOS system. The activated RecA complex induces LexA to undergo self-cleavage, destroying its repressive activity. This inactivation enables other SOS genes to be Figure 3.34: Pyrimidine dimers result from Ultraviolet light. (a) Formation of thymine dimer. (b) Distorted DNA. expressed, and the products of these genes allow replication of the damaged DNA to proceed. The SOS system allows bases to be inserted into a new DNA strand in the absence of bases on the template strand, but these insertions result in numerous errors in the base sequence. Eukaryotic cells have a specialized DNA polymerase called polymerase η (eta) that bypasses pyrimidine dimers. Polymerase η preferentially inserts AA opposite a pyrimidine dimer. This strategy seems to be reasonable because about two-thirds of pyrimidine dimers are thymine dimers. However, the insertion of AA opposite a CT dimer results in a C_G:A_T transversion. Polymerase η is therefore said to be an error-prone polymerase. [87] Transposons and Mutations - Transposons are mutagens. They can cause mutations in several ways: If a transposon inserts itself into a functional gene, it will probably damage it. Insertion into exons, introns, and even into DNA flanking the genes (which may contain promoters and enhancers) can destroy or alter the gene's activity. Faulty repair of the gap left at the old site (in cut and paste transposition) can lead to mutation there. The presence of a string of identical repeated sequences presents a problem for precise pairing during meiosis. How is the third, say, of a string of five Alu sequences on the "invading strand" of one chromatid going to ensure that it pairs with the third sequence in the other strand? If it accidentally pairs with one of the other Alu sequences, the result will be an unequal crossover - one of the commonest causes of duplications (Figure 3.35). SINEs (mostly Alu sequences) and LINEs cause only a small percentage of human mutations. (There may even be a mechanism by which they avoid inserting themselves into functional genes.) However, they have been found to be the cause of the mutations responsible for some cases of human genetic diseases, including: Hemophilia A (Factor VIII gene) and Hemophilia B [Factor IX gene] X-linked severe combined immunodeficiency (SCID) [gene for part of the IL-2 receptor] porphyria predisposition to colon polyps and cancer [APC gene] Duchenne muscular dystrophy [dystrophin gene] Deletion generation: Both deletion and inversion events next to the transposon are frequent. The end points of both the deletions and the inversions seem to be non-random and in the case of inversions, there is typically a second copy of the transposon at either end of the inverted region. Element deletion: Tn3 does not appear to be deleted precisely at a detectable frequency. Even in those few cases where revertants to a wildtype phenotype occur, subsequent analysis has shown that the wild-type genotype has not been restored. Figure 3.35: A Transposons showing Normal and Alternative Splicing. Such events that might restore a wild-type phenotype without the wild-type genotype will be discussed in the section on suppressors near the end of the text. For another transposon of this class, Tn101, reversion to a wild-type genotype has been shown to occur at 10-11, which is essentially undetectable. A transposable Bacteriophage: Mu - We will briefly treat a number of the same properties discussed above for bacteriophage Mu (named for its "mutator" effects). For all intents and purposes it belongs in the general category of a Class 2 transposon since it is not flanked by separately transposable insertion elements. Its physical size is 38 kb and [88] it generates 5 base pair duplications upon insertions. It produces an 11 base pair inverted repeat at either end. Its site preference is remarkably random and the argument has been made that its specificity can be for no more than one or two base pairs. However, in at least one particular region, it has been found that a disproportionate number of insertions fall within one very small region of the gene suggesting that there can be some site preference. Mu is rather strongly polar in both orientations, but it is clear that there is an exceedingly low level of transcription out of one end of the prophage. The transposition of Mu is known to generate deletions as roughly l0% of the Mu prophages have adjacent deletions. These deletions tend to start at one end or the other of the prophage and extend into the adjoining DNA though there also seem to be cases where the deletions are unlinked to the prophage. Finally, precise deletion of Mu is rather rare, occurring at approximately 10-9, and seems to be dependent upon at least some Mu factors. The advantages of the use of Mu are: it is not normally found in the bacterial genome and therefore there are few problems with homology to existing sequences in the chromosome; in contrast to most other transposons, Mu does not need a separate vector system since it is itself a vector, being a bacteriophage; Mu prophage (at least the cts versions, where c encodes the repressor) are inducible. The disadvantage of Mu is that it is a bacteriophage and therefore can kill the host cell. A wide variety of useful mutants of Mu have been generated. What good are transposons?- We don't know. They have been called "junk" DNA (because there is no obvious benefit to their host) and "selfish" DNA (because their only function seems to make more copies of themselves). "Because of the sequence similarities of all the LINEs and SINEs, they also make up a large portion of the "repetitive DNA" of the cell. Retrotransposons cannot be so selfish that they reduce the survival of their host. Perhaps, they even confer some benefit. Some possibilities: Retrotransposons often carry some additional sequences at their 3' end as they insert into a new location. Perhaps these occasionally create new combinations of exons, promoters, and enhancers that benefit the host. Example: o Thousands of our Alu elements occur in the introns of structural genes. o Some of these contain sequences that when transcribed into the primary transcript are recognized by the spliceosome. o These can then be spliced into the mature mRNA creating a o new exon, which will be transcribed into a new protein product. o Alternative splicing can provide not only the new mRNA (and thus protein) but also the old. o In this way, nature can try out new proteins without the risk of abandoning the tried-and-true old one. L1 elements inserted into the introns of functional genes reduce the transcription of those genes without harming the gene product — the longer the L1 element, the lower the level of gene expression. Some 79% of our genes contain L1 elements, and perhaps they are a mechanism for establishing the baseline level of gene activity. Telomerase, the enzyme essential for maintaining chromosome length, is closely related to the reverse transcriptase of LINEs and may have evolved from it. [89] RAG-1 and RAG-2. The proteins encoded by these genes are needed to assemble the repertoire of antibodies and T-cells receptors (TCRs) used by the adaptive immune system. The mechanism resembles that of the cut and paste method of Class II transposons, and the RAG genes may have evolved from them. If so, the event occurred some 450 million years ago when the jawed vertebrates evolved from jawless ancestors. Only jawed vertebrates have an adaptive immune system and the RAG-1 and RAG-2 genes that make it possible. Transposons and the C-value Paradox In Drosophila, the insertion of transposons into genes has been linked to the development of resistance to DDT and organophosphate insecticides. The genome of Arabidopsis thaliana contains 1.2 x 108 base pairs (bp) of DNA. About 14% of this consists of transposons; the rest functional genes (about 25,000 of them). The maize (corn) genome contains 20 times more DNA (2.4 x 109 bp) but surely has no need for 20 times as many genes. In fact, 60% of the corn genome is made up of transposons. (The figure for humans is 44%.) Most of the 2.5 x 1011 bp of DNA in the genome of Psilotum nudum is presumably "junk" DNA. So it seems likely that the lack of an association between size of genome and number of functional genes - the C-value paradox - is caused by the amount of transposon DNA accumulated in the genome. Insertion Sequences (IS) These are simple transposons found in bacteria that only encode insertion functions. More complex transposons may also carry genes coding for other functions. In bacterial plasmids, transposons often include genes encoding antibiotic resistance. Thus, both transfer of the plasmid from cell to Figure 3.36: Using a Transposon to mutagenize bacteria. cell and the movement of transposons from one piece of DNA to another can spread the resistance genes, particularly when a strong genetic selection is present (for instance, when animals or humans ingest antibiotics). [90] When transposons are used in a mutagenesis, as shown in (Figure 3.36), the antibiotic resistance genes are highly useful as selectable markers in tracking the transposon. A transposon carrying an antibiotic resistance gene is introduced into the cell on a vector, such as the delivery vector, that has been engineered so that it cannot persist in the cell; therefore, any resulting antibiotic-resistant colonies have the transposon integrated somewhere in their chromosome. One can select for antibiotic resistance and also screen or select for the mutation desired. Because a large amount of DNA is being inserted into a gene, mutagenesis with an insertion element will usually disrupt gene function. Bacteriophage Mu is both a phage and a transposon, named Mu because of the high frequency of mutations associated with its growth in cells. The mutations are a direct consequence of part of its life cycleintegrating randomly in the bacterial DNA as part of both its lysogenic and lytic replication cycle. The frequency of transposition of most transposons is low; there are usually hundreds rather than millions of events, all over the chromosome. Identifying the one that gives the desired phenotype may be difficult. In addition, transposons do not always integrate randomly into the genome and the degree of specificity varies with the transposon. Mutations in the transposase can increase the frequency of transposition and help randomize insertions (Kleckner, Bender, and Gottesman, 1991). Furthermore, transposition can also be carried out in vitro at high efficiency into a target piece of DNA, and the mutagenized DNA then inserted back into the cell (Hamer et al., 2001). Thus far, we have discussed the various kinds of ways to disrupt gene activity, using mutagens and transposons. Once we have made mutants, how do we detect them? 3.3.5 DNA Dmage, Repair Mechanism & Defect On Repair Mechanism Importance - DNA in the living cell is subject to many chemical alterations (a fact often forgotten in the excitement of being able to do DNA sequencing on dried and/or frozen specimens. If the genetic information encoded in the DNA is to remain uncorrupted, any chemical changes must be corrected (Figure 3.37). Figure 3.37: DNA repair. The three steps common to most types of repair are excision (step 1), resynthesis (step 2), and ligation (step 3). In step 1 the damage is excised; in steps 2 and 3 the original DNA sequence is restored. DNA polymer-ase fills in the gap created by the excision events, and DNA ligase seals the nick left in the repaired strand. Nick sealing consists of the reformation of a broken phosphodiester bond. [91] A failure to repair DNA produces a mutation - The recent publication of the human genome has already revealed 130 genes whose products participate in DNA repair. More will probably be identified soon. Agents that Damage DNA - Certain wavelengths of radiation ionizing radiation such as gamma rays and x-rays, Ultraviolet rays, especially the UV-C rays (~260 nm) that are absorbed strongly by DNA but also the longer-wavelength UV-B that penetrates the ozone shield. Highly-reactive oxygen radicals produced during normal cellular respiration as well as by other biochemical pathways. Chemicals in the environment many hydrocarbons, including some found in cigarette smoke some plant and microbial products, e.g. the aflatoxins produced in moldy peanuts, chemicals used in chemotherapy, especially chemotherapy of cancers. Types of DNA Damage - All four of the bases in DNA (A, T, C and G) can be covalently modified at various positions. One of the most frequent is the loss of an amino group ("Deamination") — resulting, for example, in a C being converted to a U. Mismatches of the normal bases because of a failure of proofreading during DNA replication. Common example: incorporation of the pyrimidine U (normally found only in RNA) instead of T. Breaks in the backbone can be limited to one of the two strands (a single-stranded break, SSB) or on both strands (a double-stranded break (DSB). Ionizing radiation is a frequent cause, but some chemicals produce breaks as well. Crosslinks Covalent linkages can be formed between bases on the same DNA strand ("intrastrand") or on the opposite strand ("interstrand"). Several chemotherapeutic drugs used against cancers crosslink DNA. Repairing Damaged Bases Damaged or inappropriate bases can be repaired by several mechanisms: Direct chemical reversal of the damage Excision Repair, in which the damaged base or bases are removed and then replaced with the correct ones in a localized burst of DNA synthesis. There are three modes of excision repair, each of which employs specialized sets of enzymes. Base Excision Repair (BER) Nucleotide Excision Repair (NER) Mismatch Repair (MMR) Direct Reversal of Base Damage - Perhaps the most frequent cause of point mutations in humans is the spontaneous addition of a methyl group (CH3-) (an example of alkylation) to Cs followed by deamination to a T. Fortunately, most of these changes are repaired by enzymes, called glycosylases, that remove the mismatched T restoring the correct C. This is done without the need to break the DNA backbone (in contrast to the mechanisms of excision repair described below). [92] Some of the drugs used in cancer chemotherapy ("chemo") also damage DNA by alkylation. Some of the methyl groups can be removed by a protein encoded by our MGMT gene. However, the protein can only do it once, so the removal of each methyl group requires another molecule of protein. This illustrates a problem with direct reversal mechanisms of DNA repair: they are quite wasteful. Each of the myriad types of chemical alterations to bases requires its own mechanism to correct. What the cell needs are more general mechanisms capable of correcting all sorts of chemical damage with a limited toolbox. This requirement is met by the mechanisms of excision repair. Base Excision Repair (BER) - The steps (Figure 3.38A) and some key players: (1) Removal of the damaged base (estimated to occur some 20,000 times a day in each cell in our body!) by a DNA glycosylase. We have at least 8 genes encoding different DNA glycosylases each enzyme responsible for identifying and removing a specific kind of base damage. (2) Removal of its deoxyribose phosphate in the backbone, producing a gap. We have two genes encoding enzymes with this function. Replacement with the correct nucleotide. This relies on DNA polymerase beta, one of at least 11 DNA polymerases encoded by our genes. Two enzymes are known that can do this; both require ATP to provide the needed energy. Nucleotide Excision Repair (NER) - NER differs from BER in several ways (Figure 3.38B). It uses different enzymes. Even though there may be only a single "bad" base to correct, its nucleotide is removed along with many other adjacent nucleotides; that is, NER removes a large "patch" around the damage. The steps and some key players: The damage is recognized by one or more protein factors that assemble at the location. The DNA is unwound producing a "bubble". The enzyme system that does this is Transcription Factor IIH, TFIIH, (which also functions in normal transcription). Cuts are made on both the 3' side and the 5' side of the damaged area so the tract containing the damage can be removed. A fresh burst of DNA synthesis - using the intact (opposite) strand as a template - fills in the correct nucleotides. The DNA polymerases responsible are designated polymerase delta and epsilon. A DNA ligase covalent binds the fresh piece into the backbone. Xeroderma Pigmentosum (XP) - XP is a rare inherited disease of humans which, among other things, predisposes the patient to pigmented lesions on areas of the skin exposed to the sun and an elevated incidence of skin cancer. It turns out that XP can be caused by mutations in any one of several genes - all of which have roles to play in NER. Some of them: XPA, which encodes a protein that binds the damaged site and helps assemble the other proteins needed for NER. XPB and XPD, which are part of TFIIH. Some mutations in XPB and XPD also produce signs of premature aging. XPF, which cuts the backbone on the 5' side of the damage [93] Figure 3.38: Comparison of two major DNA repair pathways. (A) Base excision repair. This pathway starts with a DNA glycosylase. Here the enzyme uracil DNA glycosylase removes an accidentally deaminated cytosine in DNA. After the action of this glycosylase (or another DNA glycosylase that recognizes a different kind of damage) the sugar phosphate with the missing base is cut out by the sequential action of AP endonuclease and a phosphodi-esterase, the same enzymes that initiate the repair of depurinated sites. The gap of a single nucleotide is then filled by DNA polymerase and DNA ligase. The net result is that the U that was created by accidental deamination is restored to a C. The AP endonuclease derives its name from the fact that it recognizes any site in the DNA helix that contains a deoxyribose sugar with a missing base; such sites can arise either by the loss of a purine (apurinic sites) or by the loss of a pyrimidine (apyriminic sites). (B) Nucleotide excision repair. After a multienzyme complex recognizes a bulky lesion such as a pyrimidine dimer, one cut is made on each side of the lesion, and an associated DNA helicase then removes the entire portion of the damaged strand. The multi-enzyme complex in bacteria leaves the gap of 12 nucleotides shown; the gap produced in human DNA is more than twice this size. XPG, which cuts the backbone on the 3' side. Transcription-Coupled NER - Nucleotide-excision repair proceeds most rapidly in cells whose genes are being actively transcribed on the DNA strand that is serving as the template for transcription. This enhancement of NER involves XPB, XPD, and several other gene products. The genes for two of them are designated CSA and CSB (mutations [94] in them cause an inherited disorder called Cockayne's syndrome). Synthesizing messenger RNA (mRNA), providing a molecular link between transcription and repair. One plausible scenario: If RNA polymerase II The CSB product associates in the nucleus with RNA polymerase II, the enzyme responsible for, tracking along the template (antisense) strand), encounters a damaged base, it can recruit other proteins, e.g., the CSA and CSB synthesizing messenger RNA (mRNA), providing a molecular link between transcription and repair. One plausible scenario: If RNA polymerase II proteins, to make a quick fix before it moves on to complete transcription of the gene. Mismatch Repair (MMR) – Figure 3.39: The reaction catalyzed by DNA lig-ase. This enzyme seals a broken phosphordiester bond. As shown, DNA ligase uses a molecule of ATP to activate the 5' end at the nick (step 1) before forming the new bond (step 2). In this way the energetically unfavorable nick-sealing reaction is driven by being coupled to the energetically favor-able process of ATP hydrolysis. In Bloom's syndrome, an inherited human disease, individuals are partially defective in DNA ligation and consequently are deficient in DNA repair; as a consequence, they have a dramatically increased incidence of cancer. Mismatch repair deals with correcting mismatches of the normal bases; that is, failures to maintain normal WatsonCrick base pairing (A•T, C•G). It can enlist the aid of enzymes involved in both base-excision repair (BER) and nucleotideexcision repair (NER) as well as using enzymes specialized for this function. Recognition of a mismatch requires several different proteins including one encoded by MSH2. Cutting the mismatch out also requires several proteins, including one encoded by MLH1. Mutations in either of these genes predisposes the person to an inherited form of colon cancer. So these genes qualify as Tumor Suppressor Genes. Synthesis of the repair patch is done by the same enzymes used in NER: DNA polymerase delta and epsilon. Cells also use the MMR system to enhance the fidelity of recombination; i.e., assure that only homologous regions of two DNA molecules pair up to crossover and recombine segments (e.g., in meiosis). Repairing Strand Breaks - Ionizing radiation and certain chemicals can produce both single-strand breaks (SSBs) and double-strand breaks (DSBs) in the DNA backbone. Single-Strand Breaks (SSBs) - Breaks in a single strand of the DNA molecule are repaired using the same enzyme systems that are used in Base-Excision Repair (BER). Double-Strand Breaks (DSBs) - There are two mechanisms by which the cell attempts to repair a complete break in a DNA molecule: Direct joining of the broken ends. This requires proteins that recognize and bind to the exposed ends and bring them [95] together for ligating. They would prefer to see some complementary nucleotides but can proceed without them so this type of joining is also called Non-Homologous End-Joining (NHEJ). Errors in direct joining may be a cause of the various translocations that are associated with cancers. Examples: Burkitt's lymphoma - the Philadelphia chromosome in Chronic Myelogenous Leukemia (CML) B-cell leukemia Homologous Recombination. Here the broken ends are repaired using the information on the intact Sister Chromatid (available in G2 after chromosome duplication), or on the Homologous chromosome (in G1; that is, before each chromosome has been duplicated). This requires searching around in the nucleus for the homolog - a task sufficiently uncertain that G1 cells usually prefers to mend their DSBs by NHEJ or on the same chromosome if there are duplicate copies of the gene on the chromosome oriented in opposite directions (head-to-head or back-to-back). Two of the proteins used in homologous recombination are encoded by the genes BRCA1 and BRCA2. Inherited mutations in these genes predispose women to breast and ovarian cancers. Meiosis also involves DSBs - Recombination between homologous chromosomes in meiosis I also involves the formation of DSBs and their repair. So it is not surprising that this process uses the same enzymes. Meiosis I with the alignment of homologous sequences provides a mechanism for repairing damaged DNA; that is, mutations. in fact, many biologists feel that the main function of sex is to provide this mechanism for maintaining the integrity of the genome. However, most of the genes on the human Y chromosome have no counterpart on the X chromosome, and thus cannot benefit from this repair mechanism. They seem to solve this problem by having multiple copies of the same gene - oriented in opposite directions. Looping the intervening DNA brings the duplicates together and allowing repair by homologous recombination. Gene Conversion - If the sequence used as a template for repairing a gene by homologous recombination differs slightly from the gene needing repair; that is, is an allele, the repaired gene will acquire the donor sequence. This nonreciprocal transfer of genetic information is called gene conversion. The donor of the new gene sequence may by: the homologous chromosome (during meiosis) the sister chromatid (also during meiosis) a duplicate of the gene on the same chromosome (during mitosis) Gene conversion during meiosis alters the normal mendelian ratios. Normally, meiosis in a heterozygous (A,a) parent will produce gametes or spores in a 1:1 ratio; e.g., 50% A; 50% a. However, if gene conversion has occurred, other ratios will appear. If, for example, an A allele donates its sequence as it repairs a damaged a allele, the repaired gene will become A, and the ratio will be 75% A; 25% a. 3.3.6 Inherited Human Disease Related with the Gene Many human diseases are caused by defective genes. A few common examples: Disease hemophilia A Genetic defect absence of clotting factor VIII [96] cystic fibrosis defective chloride channel protein muscular dystrophy defective muscle protein (dystrophin) sickle-cell disease defective beta globin severe combined immunoany one of several genes fail to make a protein deficiency (SCID) essential for T and B cell function All of these diseases are caused by a defect at a single gene locus. (The inheritance is recessive so both the maternal and paternal copies of the gene must be defective.) Hemophilia A - Hemophilia A is a hereditary blood disorder, primarily affecting males, characterized by a deficiency of the blood clotting protein known as Factor VIII that results in abnormal bleeding. Babylonian Jews first described hemophilia more than 1700 years ago; the disease first drew widespread public attention when Queen Victoria transmitted it to several European royal families. Mutation of the HEMA gene on the X chromosome causes Hemophilia A. Normally, females have two X chromosomes, whereas males have one X and one Y chromosome. Since males have only a single copy of any gene located on the X chromosome, they cannot offset damage to that gene with an additional copy as can females. Consequently, X-linked disorders such as Hemophilia A are far more common in males. The HEMA gene codes for Factor VIII, which is synthesized mainly in the liver, and is one of many factors involved in blood coagulation; its loss alone is enough to cause Hemophilia A even if all the other coagulation factors are still present. Treatment of Hemophilia A has progressed rapidly since the middle of the last century when patients were infused with plasma or processed plasma products to replace Factor VIII. HIV contamination of human blood supplies and the consequent HIV infection of most hemophiliacs in the mid-1980s forced the development of alternate Factor VIII sources for replacement therapy, including monoclonal antibody purified Factor VIII and recombinant Factor VIII, both of which are used in replacement therapies today. Development of a gene replacement therapy for Hemophilia A has reached the clinical trial stage, and results so far have been encouraging. Investigators are still evaluating the long-term safety of these therapies, and it is hoped that a genetic cure for hemophilia will be generally available in the future. Cystic fibrosis - Cystic fibrosis (CF) (Figure 3.40) is the most common fatal genetic disease in the United States today. It causes the body to produce a thick, sticky mucus that clogs the lungs, leading to infection, and blocks the pancreas, stopping digestive enzymes from reaching the intestines where they are required to digest food. CF is caused by a defective gene, which codes for a chloride transporter found on the surface of the epithelial cells that line the lungs and other organs. Several hundred mutations have been found in this gene, all of which result in defective transport of chloride, and secondarily sodium, by epithelial cells. As a result, the amount of sodium chloride (salt) is Figure 3.40: Building mouse models of human disease. Expression of a human cystic fibrosis (CFTR) gene in the gut of a mouse. A human antisense probe was used to show human CFTR expressed in the mouse duodenum. [97] increased in bodily secretions. The severity of the disease symptoms of CF is directly related to the characteristic effects of the particular mutation(s) that have been inherited by the sufferer. CF research has accelerated sharply since the discovery of CFTR in 1989. In 1990, scientists successfully cloned the normal gene and added it to CF cells in the laboratory, which corrected the defective chloride transport mechanism. This technique - gene therapy - was then tried on a limited number of CF patients. However, this treatment may not be as successful as originally hoped. Further research will be required before gene therapy, and other experimental treatments, prove useful in combating CF. Figure 3.41: Dystrophin and utrophin are a similar size and have comparable modular architecture. This similarity means that utrophin can sometimes substitute for dystrophin, so providing a potential route for therapy for muscular dystrophy sufferers. pathways. Duchenne muscular dystrophy - Duchenne muscular dystrophy (DMD) is one of a group of muscular dystrophies characterized by the enlargement of muscles (Figure 3.41). DMD is one of the most prevalent types of muscular dystrophy and is characterized by rapid progression of muscle degeneration that occurs early in life. All are X-linked and affect mainly males - an estimated 1 in 3500 boys worldwide. The gene for DMD, found on the X chromosome, encodes a large protein - dystrophin. Dystrophin is required inside muscle cells for structural support; it is thought to strengthen muscle cells by anchoring elements of the internal cytoskeleton to the surface membrane. Without it, the cell membrane becomes permeable, so that extracellular components enter the cell, increasing the internal pressure until the muscle cell "explodes" and dies. The subsequent immune response can add to the damage. A mouse model for DMD exists and is proving useful for furthering our understanding on both the normal function of dystrophin and the pathology of the disease. In particular, initial experiments that increase the production of utrophin, a dystrophin relative, in order to compensate for the loss of dystrophin in the mouse are promising and may lead to the development of Figure 3.42: (A) Haemoglobin is effective therapies for this devastating disease. Sickle Cell Anemia - Sickle cell anemia (Figure 3.42) is the most common inherited blood disorder in the United States, affecting about 72,000 Americans or 1 in 500 African Americans. SCA is characterized by episodes of pain, chronic hemolytic anemia and severe infections, usually beginning in early childhood. SCA is an autosomal recessive disease caused by a point mutation in the hemoglobin beta gene (HBB) found on chromosome 11p15.4. Carrier frequency of HBB varies significantly around the world, with high rates associated made up of 4 chain 2 α and 2 β. In SCA a point mutation causes the amino acid glutamine (Gln) to be replaced by valine (Val) in the β chains of HbA, resulting in the abnormal HbS. (B) Under certain conditions, such as low oxygen levels. RBCs with Hbs distort into sickle shapes. (C) These sickled cells can block small vessels producing microvascular occlusions which may causes necrosis (death) of the tissue. [98] with zones of high malaria incidence, since carriers are somewhat protected against malaria. About 8% of the African American populations are carriers. A mutation in HBB results in the production of a structurally abnormal hemoglobin (Hb), called HbS. Hb is an oxygen carrying protein that gives red blood cells (RBC) their characteristic color. Under certain conditions, like low oxygen levels or high hemoglobin concentrations, in individuals who are homozygous for HbS, the abnormal HbS clusters together, distorting the RBCs into sickled shapes. These deformed and rigid RBCs become trapped within small blood vessels and block them, producing pain and eventually damaging organs. Though, as yet, there is no cure for SCA, a combination of fluids, painkillers, antibiotics and transfusions are used to treat Figure 3.43: Gene Therapy has been attempted to treat symptoms and complications. Hydroxyurea, an antitumor drug, has been shown to be effective severe combined immunodeficiency caused by a in preventing painful crises. Hydroxyurea induces the formation missing enzyme, adenosine of fetal Hb (HbF) - a Hb normally found in the fetus or deaminase. newborn— which, when present in individuals with SCA, prevents sickling. A mouse model of SCA has been developed and is being used to evaluate the effectiveness of potential new therapies for SCA. Severe combined immunodeficiency - Severe combined immunodeficiency (SCID) represents a group of rare, sometimes fatal, congenital disorders characterized by little or no immune response (Figure 3.43). The defining feature of SCID, commonly known as "bubble boy" disease, is a defect in the specialized white blood cells (B- and T-lymphocytes) that defend us from infection by viruses, bacteria and fungi. Without a functional immune system, SCID patients are susceptible to recurrent infections such as pneumonia, meningitis and chicken pox, and can die before the first year of life. Though invasive, new Figure 3.44: Adult hemoglobin (HbA) contains treatments such as bone marrow and stem-cell transplantation save as many as 80% of SCID patients. All forms of SCID are two alpha chains and two beta chains. In thalassemia, inherited, with as many as half of SCID cases linked to the X there is deficient synthesis of chromosome, passed on by the mother. X-linked SCID results either the alpha chains or from a mutation in the interleukin 2 receptor gamma (IL2RG) the beta chains. Symptoms gene which produces the common gamma chain subunit, a are a result of not only low component of several IL receptors. IL2RG activates an important levels of HbA, but also the relatively high levels of the signalling molecule, JAK3. A mutation in JAK3, located on chain that is synthesized. chromosome 19, can also result in SCID. Defective IL receptors and IL receptor pathways prevent the proper development of T-lymphocytes that play a key role in identifying invading agents as well as activating and regulating other cells of the immune system. In another form of SCID, there is a lack of the enzyme adenosine deaminase (ADA), coded for by a gene on chromosome 20. This means that the substrates for this enzyme accumulate in cells. [99] Immature lymphoid cells of the immune system are particularly sensitive to the toxic effects of these unused substrates, so fail to reach maturity. As a result, the immune system of the afflicted individual is severely compromised or completely lacking. Some of the most promising developments in the search for new therapies for SCID center on 'SCID mice', which can be bred deficient in various genes including ADA, JAK3, and IL2RG. It is now possible to reconstitute the impaired mouse immune system by using human components, so these animals provide a very useful model for studying both normal and pathological immune systems in biomedical research. severe combined immunodeficiency (SCID) - SCID is a disease in which the patient has neither cell-mediated immune Figure 3.45: Mitochondrial responses nor is able to make antibodies. It is a disease of young children because, until recently, the localization of human frataxin absence of an immune system left them prey to infections that in live mammalian cells. ultimately killed them. About 25% of the cases of SCID are the result of the child being homozygous for defective genes encoding the enzyme adenosine deaminase (ADA). The normal catabolism of purines is deficient, and this is particularly toxic for T cells and B cells. Figure 3.46: The Thalassemia - Thalassemia is an inherited disease of faulty human HPRT1 gene contains 9 exons. A wide variety of HPRT1 mutations can cause LNS, these include deletions, insertions, singlebase substitutions and frame-shift mutations. synthesis of hemoglobin (Figure 3.44). The name is derived from the Greek word "thalassa" meaning "the sea" because the condition was first described in populations living near the Mediterranean Sea; however, the disease is also prevalent in Africa, the Middle East, and Asia. Thalassemia consists of a group of disorders that may range from a barely detectable abnormality of blood, to severe or fatal anemia. Adult hemoglobin is composed of two alpha (α) and two beta (β) polypeptide chains. There are two copies of the hemoglobin alpha gene (HBA1 and HBA2), which each encode an α -chain, and both genes are located on chromosome 16. The hemoglobin beta gene (HBB) encodes the β-chain and is located on chromosome 11. In α-thalassemia, there is deficient synthesis of α-chains. The resulting excess of β-chains bind oxygen poorly, leading to a low concentration of oxygen in tissues (hypoxemia). Similarly, in βthalassemia there is a lack of β-chains. However, the excess αchains can form insoluble aggregates inside red blood cells. These aggregates cause the death of red blood cells and their precursors, causing a very severe anemia. The spleen becomes enlarged as it removes damaged red blood cells from the circulation. Deletions of HBA1 and/or HBA2 tend to underlie most cases of α-thalassemia. The severity of symptoms depends on how many of these genes are lost. Loss of one or two genes is usually asymptomatic, whereas deletion of all four genes is fatal to the unborn child. [100] In contrast, over 100 types of mutations affect HBB, and deletion mutations are rare. Splice mutations and mutations that occur in the HBB gene promoter region tend to cause a reduction, rather than a complete absence, of β-globin chains and so result in milder disease. Nonsense mutations and frameshift mutations tend to not produce any β-globin chains leading to severe disease. Currently, severe thalassemia is treated by blood transfusions, and a minority of patients are cured by bone marrow transplantation. Mouse models are proving to be useful in assessing the potential of gene therapy. Friedreich's ataxia - Friedreich's ataxia (FRDA) is a rare inherited disease characterized by the progressive loss of voluntary muscular coordination (ataxia) and heart enlargement. It is named after the German doctor, Nikolaus Friedreich, who first described the disease in 1863. FRDA is generally diagnosed in childhood and affects both males and females. FRDA is an autosomal recessive disease caused by a mutation of a gene called frataxin, which is located on chromosome 9 (Figure 3.45). This mutation means that there are many extra copies of a DNA segment, the trinucleotide GAA. A normal individual has 8 to 30 copies of this trinucleotide, while FRDA patients have as many as 1000. The larger the number of GAA copies, the earlier the onset of the disease and the quicker the decline of the patient. Although we know that frataxin is found in the mitochondria of humans, we do not yet know its function. However, there is a very similar protein in yeast, YFH1, which we know more about. YFH1 is involved in controlling iron levels and respiratory function. Since frataxin and YFH1 are so similar, studying YFH1 may help us understand the role of frataxin in FRDA. Lesch-Nyhan syndrome - Lesch-Nyhan syndrome (LNS) is a rare inherited disease that disrupts the metabolism of the raw material of genes. These raw materials are called purines, and they are an essential part of DNA and RNA. The body can either make purines (de novo synthesis) or recycle them (the resalvage pathway). Many enzymes are involved in these pathways. When one of these enzymes is missing, a wide range of problems can occur. In LNS, there is a mutation in the HPRT1 gene located on the X chromosome. The product of the normal gene is the enzyme hypoxanthine-guanine phosphoribosyltransferase, which speeds up the recycling of purines from broken down DNA and RNA. Many different types of mutations affect this gene, and the result is a very low level of the enzyme. The mutation is inherited in an X-linked fashion. Females who inherit one copy of the mutation are not affected because they have two copies of the X chromosome (XX). Males are severely affected because they only have one X chromosome (XY), and therefore their only copy of the HPRT1 gene is mutated (Figure 3.46). Mutations of the HPRT1 gene cause three main problems. First is the accumulation of uric acid that normally would have been recycled into purines. Excess uric acid forms painful deposits in the skin (gout) and in the kidney and bladder (urate stones). The second problem is self-mutilation. Affected individuals have to be restrained from biting their fingers and tongues. Finally, there is mental retardation and severe muscle weakness. In the year 2000 it was shown that the genetic deficiency in LNS could be corrected in vitro. A virus was used to insert a normal copy of the HPRT1 gene into deficient human [101] cells. Such techniques used in gene therapy may one day provide a cure for this disease. For now, medications are used to decrease the levels of uric acid. 3.3.7 Cancer A cancer is an uncontrolled proliferation of cells. In some the rate is fast; in others, slow; but in all cancers the cells never stop dividing. This distinguishes cancers - malign tumors or malignancies (Figure 3.47) - from benign growths like moles where their cells eventually stop dividing. Cancers are clones. No matter how many trillions of cells are present in the cancer, they are all descended from a single ancestral cell. Evidence: Although normal tissues of a woman are a mosaic of cells in which one X chromosome or the other has been inactivated, all her tumor cells - even if from multiple sites - have the same X chromosome inactivated. Cancers begin as a primary tumor. At some point, however, cells break Figure 3.47: Development of away from the primary tumor and - traveling in blood and lymph - Tumorus tissue. establish metastases in other locations of the body. Metastasis is what usually kills the patient. Cancer cells are usually less differentiated than the normal cells of the tissue where they arose. Many people feel that this reflects a process of dedifferentiation, but I doubt it. Rather, evidence is accumulating that cancers arise in precursor cells - stem cells or "progenitor cells" - of the tissue: cells that are dividing by mitosis producing daughter cells that are not yet fully differentiated 3.3.8 Initiation of Cancer at Cellular Level/Progression to Cancer Figure 3.48: The activities and cellular locations of the products of the main classes of known protooncogenes. Some representative proto-oncogenes in each class are indicated in brackets. [102] Figure 3.49: Three ways in which a proto-oncogene can be converted into an oncogene. A fourth mechanism (not shown) involves recombination between retroviral DNA and a proto-oncogene. This has effects similar to those of chromosome rearrangement, bringing the proto-oncogene under the control of a viral enhancer and/or fusing it to a viral gene that is actively transcribed. Figure 3.50: The translocation between chromosomes 9 and 22 responsible for chronic myelogenous leukemia. The smaller of the two resulting abnormal chromosomes is called the Philadelphia chromosome, after the city where the abnormality was first recorded. What probably happens is: A single cell - perhaps an adult stem cell or progenitor cell in a tissue suffers a mutation (red line) in a gene involved in the cell cycle, e.g., an oncogene or tumor suppressor gene. This results in giving that cell a slight growth advantage over other dividing cells in the tissue. As that cell develops into a Figure 3.51: Evidence from X-inactivation mosaics demonstrates the monoclonal origin of cancers. As a result of a random process that occurs in the early embryo, practically every normal tissue in a woman's body is a mixture of cells with different X chromosomes heritably inactivated (indicated here by the mixture of red cells and gray cells in the normal tissue). When the cells of a cancer are tested for their expression of an Xlinked marker gene, however, they are usually all found to have the same X chromosome inactivated. This implies that they are all derived from a single cancerous founder cell. [103] clone, some if its descendants suffer another mutation (red line) in another cell-cycle gene. This further deregulates the cell cycle of that cell and its descendants. As the rate of mitosis in that clone increases, the chances of further DNA damage increases. Eventually, so many mutations have occurred that the growth of that clone becomes completely unregulated. The result: full-blown cancer. Stem cells are cells that divide by mitosis to form either two stem cells, thus increasing the size of the stem cell "pool", or one daughter that goes on to differentiate, and one daughter that retains its stem-cell properties. There is growing evidence that most of the cells in leukemias, breast, brain, Figure 3.52: How replication of damaged DNA can lead to chromosome abnormalities and gene amplification. The diagram shows one of several possible mechanisms. The process begins with accidental DNA damage in a cell that lacks functional p53 protein. Instead of halting at the p53- dependent checkpoint in the G1 phase of the division cycle, where a normal cell with damaged DNA would halt until the damage was repaired, the p53-defective cell enters S phase, with the conseque-nces shown. Once a chromosome carrying duplication and lacking a telomere has been generated, repeated rounds of replication, chromatid fusion, and unequal breakage can increase the number of copies of the duplicated region still further. Selection in favor of cells with increased numbers of copies of a gene in the affected chromosomal region will thus lead to mutants in which the gene is amplified to a high copy number. The multiple copies may eventually become visible as a homo-geneously staining region in the chromosome, or they may – either through a recombination event or through unrepaired DNA strand breakage - become excised from their original locus and so appear as independent double minute chromosomes . and colon cancers are not able to proliferate out-of-control (and to metastasize). Only those members of the clone that retain their stem-cell-like properties (~2.5% of the cells in a tumor of the colon) can do so. There is certain logic to this. Most terminally-differentiated cells have limited potential to divide by mitosis and, seldom passing through S phase of the cell cycle, are limited in their ability to accumulate the new mutations that predispose to becoming cancerous. Furthermore, they often have short life spans - being eliminated by apoptosis (e.g., lymphocytes) or being shed from the tissue (e.g., epithelial cells of the colon). The adult stem cell pool, in contrast, is long-lived, and its members have many opportunities to [104] acquire new mutations as they produce differentiating daughters as well as daughters that maintain the stem cell pool. Different types of genetic accident that can convert a proto-oncogene into an oncogene are summarized in Figure 3.49. The gene may be altered by a point mutation, by a deletion, through a chromosomal translocation, or by insertion of a mobile genetic element such as retroviral DNA. 3.3.9 Proto-oncogenes & Oncogenes Proto-oncogenes A Retrovirus Can Transform a Host Cell by Inserting Its DNA Next to a Proto-oncogene of the Host - There are two ways in which a proto-oncogene can be converted into an oncogene upon incorporation into a retrovirus: the gene sequence may be altered or truncated so that it codes for a protein with abnormal activity, or the gene may be brought under the control of powerful promoters and enhancers in the viral genome that cause its product to be made in excess or in inappropriate circumstances. Retroviruses can also exert similar oncogenic effects in a different way: DNA copies of the viral RNA may simply be inserted into the host cell genome at sites close to, or even within, proto-oncogenes. The resulting genetic disruption is called an insertional mutation, and the altered genome is inherited by all the progeny of the original host cell. More or less random insertion of DNA copies of the viral RNA into the host DNA occurs as part of the normal retroviral life cycle, and in at least one well-documented case, insertion anywhere within about 10,000 nucleotide pairs from a proto-oncogene can cause abnormal activation of that gene. Insertional mutagenesis provides an important means of identifying proto-oncogenes, which can be tracked down by their proximity to the inserted viral DNA. Proto-oncogenes identified in this way often turn out to be the same as those discovered in the other way, as counterparts to oncogenes that retroviruses carry from cell to cell, but some new ones have been discovered as well (Table 3.1). An example is the Wnt-1 gene, activated by insertional mutagenesis in breast cancers in mice infected with the mouse mammary tumor virus (Figure 3.48). This gene turns out to be closely homologous to the Drosophila gene wingless, which is involved in cell-cell communications that regulate details of the body pattern of the fly. Different Searches for the Genetic Basis of Cancer Converge on Disturbances in the Same Proto-oncogenes - While some researchers pursued the line of investigation leading from retroviruses to oncogenes, others took a more direct approach and searched for DNA sequences in human cancer cells that would provoke uncontrolled proliferation when introduced into non cancerous cells. The assay was again done in cell culture, using an established line of mouse-derived fibroblast cells - NIH 3T3 cells - as the noncancerous hosts and transfecting them with DNA taken from human tumor cells. The findings were dramatic. Oncogenes were detected in many lines of human cancer cells, and in several cases these oncogenes turned out to be mutant alleles of some of the same proto-oncogenes that had been identified by the retroviral approach or of genes very [105] closely related to them. About one in four human tumors, for example, was found to contain a mutated member of the ras gene family, first discovered as oncogenes carried by retroviruses that cause sarcomas in rats. Thus two independent lines of inquiry converged on the same genes. Yet another approach that led to some of the same protooncogenes was based on the karyotyping of tumor cells. As mentioned earlier, in almost all patients with chronic myelogenous leukemia, the leukemic cells show the same chromosomal translocation, between chromosomes 9 and 22; likewise, in Burkitt's lymphoma there is regularly a translocation between chromosome 8 and one of the three chromosomes containing the genes that encode antibody molecules. In both these types of cancer the translocation breakpoint, where part of one chromosome is joined to another, was found to coincide exactly with the location of a proto-oncogene already known from retroviral studies - abl in chronic myelogenous leukemia, myc in Burkitt's lymphoma. Analogous chromosome translocations are similarly associated with some other types of cancer. From DNA sequencing studies it seems that in some cases the translocation turns a proto-oncogene into an oncogene by fusing the proto-oncogene to another gene in such a way that an altered protein is produced (Figure 3.49); in other cases the translocation moves a proto-oncogene into an inappropriate chromosomal environment that activates its transcription so that the normal protein is produced in excess. A Proto-oncogene Can Be Made Oncogenic in Many Ways - So far, about 60 proto-oncogenes have been discovered (Tables 3.1 and 3.2 show a small selection); each of these can be converted into an oncogene that plays a dominant part in cancers of one sort or another. Most such genes have been encountered repeatedly, in a variety of mutant forms and in several kinds of cancer, suggesting that the majority of mammalian protooncogenes may already have been identified. But what functions do these genes have in a normal healthy cell, that mutations in them should be so dangerous? Most proto-oncogenes code for components of the mechanisms that regulate the social behavior of cells in the body - in particular, the mechanisms by which signals from a cell's neighbors can impel it to divide, differentiate, or die. In fact, many of the components of cell signaling pathways were first identified through searches for oncogenes, and a full list of proto-oncogene products includes examples of practically every type of molecule involved in cell signaling - secreted proteins, transmembrane receptors, GTP-binding proteins, protein kinases, gene regulatory proteins, and so on, as summarized in Figure 24-26. All these molecules normally serve in complex relay chains to deliver signals for the production of more cells when more cells are needed. But mutations can alter them so that they deliver the signals even when more cells are not needed. The proto-oncogene erbB, for example, codes for the receptor for epidermal growth factor (EGF); when EGF binds to the receptor's extra-cellular domain, the intracellular domain generates a stimulatory signal inside the cell. A mutation in c-erbB can turn it into an oncogene by deleting the extracellular EGF-binding domain in such a way that the intracellular stimulatory signal is produced constantly, even if no EGF is present. In a similar way a point mutation at an appropriate site in a ras gene can create a Ras protein that fails to hydrolyze its bound GTP and so persists abnormally in its active state, transmitting an intracellular signal for cell proliferation even when it should not. Innumerable other examples can be given. [106] The basic types of genetic accident that can convert a proto-oncogene into an oncogene are summarized in Figure 3.51. The gene may be altered by a point mutation, by a deletion, through a chromosomal translocation, or by insertion of a mobile genetic element such as retroviral DNA. The change can occur in the protein-coding region so as to yield a hyperactive product, or it can occur in adjacent control regions so that the gene is simply over expressed. Alternatively, the gene may be over expressed because it has been amplified to a high copy number through errors in the process of chromosome replication. Specific types of abnormality are characteristic of particular genes and of the responses to particular carcinogens. For example, 90% of the skin tumors evoked in mice by the tumor initiator dimethylbenz[a]anthracene (DMBA) have an A-to-T alteration at exactly the same site in a mutant ras gene; presume-ably, of the mutations caused by DMBA, it is only the ones at this site that efficiently activate skin cells to form a tumor. Members of the Myc gene family, on the other hand, are frequently over expressed or amplified. The Myc protein normally acts in the nucleus as a signal for cell proliferation, as excessive quantities of Myc cause the cell to embark on the cell-division cycle in circumstances where a normal cell would halt. Table 3.1. Some Oncogenes Originally Identified Through Their Presence in Transforming Retroviruses [107] Table 3.2. Some Oncogenes Originally Identified by Means Other Than Their Presence inTransforming Retroviruses S.N. 1 2 3 4 Means of Detection Oncogenes Insertional mutation Wnt-1 (int-1), fgf-3 (int-2), Notch-1 (int-3), lck Amplification L- myc, N- myc Transfection neu, N- ras, trk, ret Translocation bcl-2, RARa 3.3.9 Oncogenes An oncogene is a gene that when mutated or expressed at abnormally-high levels contributes to converting a normal cell into a cancer cell. Cancer cells are cells that are engaged in uncontrolled mitosis. The signals for normal mitosis Normal cells growing in culture will not divide unless they are stimulated by one or more growth factors present in the culture medium. Example: PDGF (platelet-derived growth factor), which is encoded by the gene PDGFB (also known as SIS). The molecules of growth factor bind to molecules of its receptor, an integral membrane protein embedded in the plasma membrane with its ligand-binding site exposed at the surface of the cell. Example: the protein encoded by the gene ERBB2 encodes a receptor for epidermal growth factor (EGF). (In humans, ERBB2 is also known as HER2.) Binding of a growth factor to its receptor triggers a cascade of signaling events within the cytosol. Many of these involve kinases — enzymes that attach phosphate groups to other proteins. Examples: the proteins encoded by SRC, RAF, ABL, and the fusion protein encoded by BCR/ABL found in chronic myelogenous leukemia (CML). Or molecules that turn on kinases. Example: RAS. RAS molecules reside on the inner surface of the plasma membrane where they serve to link receptor activation to "downstream" kinases like RAF. In most cases, phosphorylation activates the protein and eventually transfers the signal into the nucleus. Here phosphorylation activates transcription factors that bind to promoters and enhancers in DNA, turning on their associated genes. Examples: AP-1, a heterodimer of the proteins encoded by jun and fos. Some of the genes turned on by these transcription factors encode other transcription factors. Example: myc. Some of the genes turned on by these downstream transcription factors encode cyclins that prepare the cell to undergo mitosis. Genes that participate in any one of the steps above can become oncogenes if: they become mutated so that their product becomes constitutively active (that is, active all the time even in the absence of a positive signal) or they produce their product in excess. Possible causes: Their promoter and/or enhancer have become mutated. Example: the oncomouse: a transgenic mouse that has both copies of its myc gene under the influence of extrapowerful promoters. Loss (e.g., by a translocation) of the 3' UTR of their mRNA so that a microRNA (miRNA) that normally represses translation can no longer do so. [108] All these oncogenes act as dominants; if the cell has one normal gene (sometimes called a proto-oncogene) at a locus and one mutated gene (the oncogene), the abnormal product takes control. No single oncogene can, by itself, cause cancer. It can, however, increase the rate of mitosis of the cell in which it finds itself. Dividing cells are at increased risk of acquiring mutations, so a clone of actively dividing cells can yield subclones of cells with a second, third, etc. oncogene. When a clone loses all control over its mitosis it is well on its way to developing into a cancer. Other types of potential cancerpromoting genes - Genes that inhibit apoptosis. The suicide of damaged cells (apoptosis) provides an important mechanism for ridding the body of cells that could go on to form a cancer. It is not surprising then that inhibiting apoptosis can promote the formation of a cancer. Example: Bcl-2. The product of this gene inhibits apoptosis. Over expression of the gene is a hallmark of B-cell cancers. Genes involved in repairing DNA or stopping mitosis if they fail - Mutations arise from an unrepaired error in DNA. So any gene whose product participates in DNA repair probably can also behave as an oncogene when mutated. Example: ATM. ATM (="ataxia telangiectasia mutated") gets its name from a human disease of that name, whose patients - among other things are at increased risk of cancer. The ATM protein is also involved in detecting DNA damage and interrupting the cell cycle when damage is found. It is estimated that fully 1% of the 25,000 or so genes in the human genome are protooncogenes. This graph (based on the work of E. Sinn et al, Cell 49:465,1987) shows the synergistic effect of two oncogenes. The fraction (%) of transgenic mice without tumors is shown as a function of age. Three groups are shown: those mice transgenic for a hyperactive myc alone (blue) those transgenic for ras alone (green) those transgenic for both myc and ras (red) Tumor-Suppressor Genes The products of some genes inhibit mitosis. These genes are called tumor suppressor genes. [109] 3.4 LET US SUM UP Genetic recombination mechanisms allow large sections of DNA double helix to move from one chromosome to another. In general recombination the initial reactions rely on extensive base-pairing interactions between strands of the two DNA double helices that will recombine. It does not normally change the arrangement of the genes in a chromosome. Site-specific recombination, on the other hand, alters the relative positions of nucleotide sequences in chromosomes because the pairing reactions depend on a protein mediated recognition of the two DNA sequences that will recombine, and extensive sequence homology is not required. Two site-specific recombination mechanisms are common: (1) conservative sitespecific recombination, which produces a very short heteroduplex and therefore requires some DNA sequence that is the same on the two DNA molecules, and (2) transpositional site-specific recombination, which produces no heteroduplex and usually does not require a specific sequence on the target DNA. A protein called Ku is essential for NHEJ. Ku is a heterodimer of the subunits Ku70 and Ku80. In the 9 August 2001 issue of Nature, Walker, J. R., et al, report the three-dimensional structure of Ku attached to DNA. Their structure shows beautifully how the protein aligns the broken ends of DNA for rejoining. Some of the same enzymes used to repair DSBs by direct joining are also used to break and reassemble the gene segments used to make antibody variable regions; that is, to accomplish V(D)J joining - (mice whose Ku80 genes have been knocked out cannot do this); different antibody classes; that is, to accomplish class switching. Concept-A question arise that how does the MMR system know which is the incorrect nucleotide? In E. coli, certain adenines become methylated shortly after the new strand of DNA has been synthesized. The MMR system works more rapidly, and if it detects a mismatch, it assumes that the nucleotide on the alreadymethylated (parental) strand is the correct one and removes the nucleotide on the freshly-synthesized daughter strand. How such recognition occurs in mammals is not yet known. Some mutations arise from spontaneous alterations to DNA structure, such as depurination and deamination, which may alter the pairing properties of the bases and cause errors in subsequent rounds of replication. Ionizing radiation such as X-rays and gamma rays damage DNA by dislodging electrons from atoms; these electrons then break phosphodiester bonds and alter the structure of bases. Ultraviolet light causes mutations primarily by producing pyrimidine dimers that disrupt replication and transcription. The SOS system enables bacteria to overcome replication blocks but introduces mistakes in replication. Chemicals can produce mutations by a number of mechanisms. Base analogs are inserted into DNA and frequently pair with the wrong base. Alkylating agents, deaminating chemicals, hydroxylamine, and oxidative radicals change the structure of DNA bases, thereby altering their pairing properties. Intercalating agents wedge between the bases and cause single-base insertions and deletions in replication. [110] 5.5 CHECK YOUR PROGRESS NOTE: (1) Write your answer in the space given below. (2) Compare your answer with the one given at the end of this unit. ( א1) Fill in the blanks : (a) Mendel's experiments with mixing one trait always resulted in a …….. ratio between dominant and recessive phenotypes, his experiments with mixing two traits (dihybrid cross) showed ……………. ratios. (b) E. coli ………. protein also has a major role in homologous recombination. (c) When the virus enters a cell, a virus-encoded enzyme called ………………… is synthesized. (d) RecBCD, also known as ………………………………... (e) The first discovery of a chemical mutagen was made by …………………. ( א2) Write the answer of following questions : (a) How RecA and RecBCD are important for recombination? (b) Explain Genetic Mapping and Physical Mapping? [111] (c) Transposons are associated with mutation? Explain this fact (d) Write name of disease which are developed due to disorder in function of gene and explain the related features. 5.6 CHECK YOUR PROGRESS : THE KEY ( א1) (a) Chromatin 3:1, 9:3:3:1 (c) Exonuclease V ( א2) (a) see section 3.2.3 (a) see section 3.3.1 (b) RecA (d) Charlotte Auerbach (b) see section 3.2.6 (b) see section 3.3.6 3.7 ASSIGNMENT Make a project illustrating process of recombination including Holliday Junction Model. [112] 3.8 REFERENCES Our courteous thanks to following two authors/publishers for preparing the various section of this chapter:B. Alberts et al., ‘Molecular Biology of the Cell’: 4th Ed. (2002). Garland. Benzamin A. Pierce, ‘Genetics : A Coneptual Approach’ Other helping resources are as follows:Joo C, McKinney SA, Nakamura M, Rasnik I, Myong S and Ha T. (2007). Realtime Observation of RecA Filament Dynamics with Single Monomer Resolution. Cell 126:515-527. Singleton MR, Dillingham MS, Gaudier M, Kowalczykowski SC, Wigley DB, "Crystal structure of RecBCD enzyme reveals a machine for processing DNA breaks", Nature. (2004) Nov 11; 432 (7014): 187-93. Lewin. Genes VII. (2000). Oxford University Press. C. R. Calladine and H. R. Drew. Understanding DNA: The Molecule and How It Works. 2nd edn (1997). Academic Press. (3rd edn due in 2004). Jeremy Dale and Simon F. Park. Molecular Genetics of Bacteria, 4th Edition 2004. John Wiley & Sons, Ltd H. Lodish et al. Molecular Cell Biology, 4th edn (1995). W. H. Freeman. (5th edn due in 2003–2004). L. Snyder and W. Champness (2003). Molecular Genetics of Bacteria, 2nd edn. American Society for Microbiology. S. Baumberg (ed.) (1999). Prokaryotic Gene Expression. M. T. Madigan, J. M. Martinko and J. Parker (2000). Biology of Microorganisms (better known as ‘Brock’), 9th edn. Prentice Hall International. J. W. Dale and M. von Schantz (2002). From Genes to Genomes. John Wiley & Sons. T. A. Brown (2001). Gene Cloning – An Introduction, 4th edn. Blackwell Science. S. B. Primrose, R. Twyman and R. W. Old (2001). Principles of Gene Manipulation, 6th edn. Blackwell Science. B. R. Glick (2003). Molecular Biotechnology: Principles and Applications of Recombinant DNA, 3rd edn. American Society for Microbiology. D. P. Snustad and M. J. Simmons (2000). Principles of Genetics, 2nd edn. John Wiley. W. S. Klug and M. R. Cummings (2000). Concepts of Genetics, 6th edn. Prentice Hall. L. H. Hartwell and others (2000). Genetics. McGraw-Hill. P. J. Russell (2002). Genetics. Benjamin Cummings. A. J. F. Griffiths, W. M. Gelbart, R. C. Lewontin and J. H. Miller (2002). Modern Genetic Analysis, 2nd edn. W. H. Freeman. R. W. Hendrix et al (1983). Lambda II. M. Wilson, R. McNab and B. Henderson (2002). Bacterial Disease Mechanisms. Cambridge University Press. [113] W. Hayes (1968). The Genetics of Bacteria and their Viruses, 2nd edn. Blackwell Scientific Publications. Websites which give more information on chromosomal activities, genomic data bases and mutation are as follows:http://www.sanger.ac.uk/ http://www.tigr.org/ http://www.ncbi.nlm.nih.gov/ ****** [114]