UNIT-2 GENETICS OF PROKARYOTES AND EUKARYOTIC

advertisement
UNIT-2 GENETICS OF PROKARYOTES AND
EUKARYOTIC ORGANELLES AND GENE
STRUCTURE & EXPRESSION
Structure
2.0 Introduction
2.1 Objectives
2.2 Genetics of Prokaryotes and Eukaryotic Organelles:
2.2.1 Phage Phenotypes
2.2.2 Mapping the Bacteriophage Genome
2.2.3 Recombination in Phage
2.2.4 Genetic Transformation, Conjugation and Transduction in Bacteria
2.2.5 Genetics of Mitochondria and Chloroplasts
2.2.6 Cytoplasmic Male Sterility
2.3 Gene Structure and Expression:
2.3.1 Genetic Fine Structure: Fine Structure of Gene
2.3.2 Cis-Trans Test
2.3.3 The Structure Analysis of Eukaryotes Introns and their Significance
2.3.4 RNA Splicing
2.3.5 Regulation of Gene Expression in Prokaryotes and Eukaryotes
2.4 Let Us Sum Up
2.5 Check Your Progress
2.6 Check Your Progress: The Key
2.7 Assignment
2.8 References
2.0 INTRODUCTION
In this world variety of organisms are present from several centuries. They continuous
interact with the environment and perform all the necessary activities require for the life.
Reproduction is a characteristic living activity found among all the living entities either
prokaryotes or eukaryotes. In this chapter we will try to understand the genetics of lower
unicellular and higher multicellular organisms.
2.1 OBJECTIVE
This unit is aim to explain the gene related behaviour of viruses (phages), bacteria
together with the eukaryotes. One can be learn following facts after thorough movement
over to this unit:
Life Cycle, General & New Phenotype, recombination and genome mapping
technique for Bacteriophages,
[1]
Commonly adapted methods of reproduction in bacteria,
Extrachromosomal genetics of eukaryotic organelles like mitochondria and
chloroplast including the role of cytoplasm in some features of heredity,
Physical status of gene and its single or unit behaviour (cis-trans test),
Non-coding region(Intron) of mRNA but having importance,
Processing of RNA by splicing mechanism and
Genes having different units or factors for controlling its working.
2.2 GENETICS OF PROKARYOTES AND EUKARYOTIC
ORGANELLES
2.2.1 Phage Phenotypes
Bacteriophage - A bacteriophage (from 'bacteria' and Greek phagein, 'to eat') is any one
of a number of viruses that infect
bacteria. The term is commonly
used in its shortened form,
phage.
Typically, bacteriophages consist
of an outer protein hull enclosing
genetic material. The genetic
material can be ssRNA (single
stranded RNA), dsRNA, ssDNA,
or dsDNA between 5 and 500
kilo base pairs long with either
circular or linear arrangement.
Bacteriophages are much smaller
than the bacteria they destroy usually between 20 and 200 nm
in size.
T2 and its close relative T4 are
viruses that infect the bacterium
E. coli. The infection ends with
destruction (lysis) of the
bacterial cell so these viruses are
examples of bacteriophages
Figure 2.1: Phenotype of a Bacteriophage
("bacteria eaters").
General Phenotype - Generally each virus particle (virion) consists of:


a protein head (~0.1 µm) inside of which is a single, circular molecule of doublestranded DNA containing 166,000 base pairs. (Figure 2.1)
a protein tail from which extend thin protein fibers
Life Cycle - The virus attaches to the E. coli cell. This requires a precise molecular
interaction between the fibers and the cell wall of the host. The DNA molecule is injected
[2]
into the cell. Within 1 minute, the viral DNA begins to be transcribed and translated into
some of the viral proteins, and synthesis of host proteins is stopped. At 5 minutes, viral
enzymes needed for synthesis of new viral DNA molecules are produced. At 8 minutes,
some 40 different structural proteins for the viral head and tail are synthesized. At 13
minutes, assembly of new viral particles begins. At 25 minutes, the viral lysozyme
destroys the bacterial cell wall and the viruses burst out — ready to infect new hosts.
o If the bacterial cells are growing in liquid culture, it turns clear.
o If the bacterial cells are growing in a "lawn" on the surface of an agar
plate, then holes, called plaques (Figure 2.2), appear in the lawn.
New Phenotypes - Occasionally, new phenotypes appear such as a change in the
appearance of the plaques or even a loss in the ability to infect the host.
Examples:
 h
o Some strains of E. coli, e.g. one designated B/2, gain the ability to resist
infection by normal ("wild-type") T2. The mutation has caused a change in
the structure of their cell wall so that the tail fibers of T2 can no longer bind
to it. However, T2 can strike back. Occasional T2 mutants appear that
overcome this resistance. The mutated gene, designated h (for "host range"),
encodes a change in the tail fibers so they can once again bind to the cell
wall of strain B/2. The normal of "wild-type" gene is designated h+ .
o When plated on a lawn containing both E. coli B and E. coli B/2,
 the mutant (h) viruses can lyze both strains of E. coli, producing
clear plaques, while
 the wild-type (h+) viruses can only lyze E. coli B producing
mottled or turbid plaques.
 r
o Occasional T2 mutants appear that break out of their host cell earlier than
normal.
o The mutation occurs in a gene designated r (for "rapid lysis"). It reveals
itself by the extra-large plaques that it forms.
o The wild-type gene, producing a normal time of lysis, is designated r+. It
forms normal-size plaques.
As with so many organisms, the occurrence of mutations provides the tools to learn about
such things as
 The function of the gene;
 Its location in the DNA molecule (mapping).
2.2.2 Mapping the Bacteriophage Genome
A bacteriophage (from 'bacteria' and Greek ‘phagein= to eat') is any one of a number of
viruses that infect bacteria. The term is commonly used in its shortened form, phage.
Typically, bacteriophages consist of an outer protein hull enclosing genetic material. The
genetic material can be ssRNA (single stranded RNA), dsRNA, ssDNA, or dsDNA
between 5 and 500 kilo base pairs long with either circular or linear arrangement.
[3]
Bacteriophages are much smaller than the bacteria they destroy - usually between 20 and
200 nm in size.
T2 and its close relative T4 are viruses that
infect the bacterium E. coli. The infection
ends with destruction (lysis) of the bacterial
cell so these viruses are examples of
bacteriophages
("bacteria
eaters")
Bacteriophage genome can be mapped by
following method.
Figure 2.2: Plaques are clear patches of
lysed cells on a lawn of bacteria. (E.C.S.
Chan/Visuals Unlimited)
Techniques for the Study of
Bacteriophage’s genome
Viruses reproduce only within host cells; so
bacteriophages must be cultured in bacterial
cells. To do so, phages and bacteria are mixed
together and plated on solid medium in a Petri
plate. A high concentration of bacteria is used
so that the colonies grow into one another and
produce a continuous layer of bacteria, or
“lawn,” on the agar. An individual phage
infects a single bacterial cell and goes through
its lytic cycle. Many new phages are released
from the lysed cell and infect additional cells;
the cycle is then repeated. The bacteria grow
on solid medium; so the diffusion of the
phages is restricted and only nearby cells are
infected. After several rounds of phage
reproduction, a clear patch of lysed cells (a
plaque) appears on the plate (Figure 2.2).
Figure 2.3: Hershey and Rotman developed
a technique for mapping viral genes. (Photo
from G.S. Stent, Molecular Biology of
Bacterial Viruses. Copyright © 1963 by W.H.
Freeman and Company.)
[4]
Each plaque represents a single phage that multiplied and lysed many cells. Plating a
known volume of a dilute solution of phages on a bacterial lawn and counting the number
of plaques that appear can be used to determine the original concentration of phage in the
solution.
(1) Mapping by Recombination Frequencies - The strain B of E. coli can
be infected by both h+ and h strains of T2 bacteriophage. In fact, a single bacterial cell
can be infected simultaneously by both.
Let us infect a liquid culture of E. coli B with two different mutant T2 viruses
 h r+ and
 h+ r (Figure 2.4)
When this is done in liquid culture, and then
plated on a mixed lawn of E. coli B and B/2,
four different kinds of plaques appear.
The
most
Figure
2.4:
Recombination in Phage
abundant (460 each)
are
those
representing the
parental
types;
that
is,
the
phenotypes
are
those expected from the two infecting strains. However, small
numbers (40 each) of two new phenotypes appear. These can be
Number
of
Plaques
460
explained by genetic recombination having
occasionally occurred between the DNA of each
parental type within the bacterial cell.
+
Just as in higher organisms, one assumes that
hr
clear, small
the frequency of recombinants is proportional to
+
hr
turbid, large
460
the distance between the gene loci. In this case,
h+r+
turbid, small
40
80 out of 1000 plaques were recombinant, so the
distance between the h and r loci is assigned a
hr
clear, large
40
value of 8 map units or centimorgans.
Total =
1000
Now
repeat
coinfecting E. coli B with two other strains of T2:

hm+ and
hm+
470
 h+ m
+
hm
470
Again, 4 kinds of plaques are produced:
+
+
parental (470 each) and recombinant
hm
30
(30 each).
hm
30
The smaller number of recombinants
Total = 1000
indicates that these two gene loci (h and
m) are closer together (6 cM) than h
and r (8 cM). But the order of the three loci could be either
Genotype
Phenotype
[5]


m–6–h—8—r or
mr+
440
h–6–m-2-r
To find out which is the
m+r
440
correct order, perform a
m+r+ 60
third mating using
 mr+ and
Mr
60
 m+r
Total = 1000
This makes it clear that the
order is m—h—r, not h—m—r. But why only
12cM between the outside loci (m and r) instead
of the 14cM produced by adding the map distances found in the first two matings?
(2) Mapping by A Three-Point Cross - The answer comes from performing
a mating between T2 viruses differing at all three loci:
 hmr and
 h+m+r+
(Note: this time one parent has
all mutant; the other all wild-type
alleles — don't be confused!)
Group 1
hmr
435
Group 2
+
hmr
435
Group 3
h+mr+
25
+ +
+
Group 4
hm r
25
Group 5
+
hmr
35
Group 6
h+m+r
35
Group 7
+ +
Recombination by Three Point cross
The result: 8 different types of plaques are formed.
+
 parentals; that is, nonrecombinants in Groups 1
Group 8
h mr
5
and 2;
Total = 1000
 recombinants - all the others
Analyzing these data shows how the two-point cross between m and r understated the
true distance between them.
Let's first look at single pairs of recombinants as we did before (thus ignoring the third
locus).
 If we look at all the recombinants between h and r but ignore m (as in the first
experiment), we find that they are contained in Groups 5, 6, 7, and 8 -7 giving the
total of 80 that we found originally.
 If we look at recombinants between h and m but ignore r (as in the second
experiment), we find that they are contained in Groups 3, 4,7, and 8 - giving the
same total of 60 that we found before.
 But if we focus only on m and r (as we did in the third experiment), we find that
the recombinants are contained in Groups 3, 4, 5, and 6 - giving the same total of
120 as before while the non-recombinants are not only in Groups 1 and 2 but also
hm r
5
[6]
in Groups 7 and 8. The reason: a double-crossover occurred in these cases,
restoring the parental configuration of the m and r alleles.
 Because these double crossovers were hidden in the third experiment, the map
distance (12 cM) was understated. To get the true map distance, we add their
number to each of the other recombinant groups (Groups 3,4,5, and 6) so 25 + 5
+25 +5 +35 + 5 + 35 + 5 = 140, and the true map distance between m and r is the
14 cM that we found by adding the map distances between h and r (8 cM) and h
and m (6 cM).
The three-point cross is also useful because it gives the gene order simply by inspection:
 Find the rarest genotypes (here Groups 7 and 8), and
 the gene NOT in the parental configuration (here h) is always the middle one.
2.2.3 Genetic Recombination in Phage
Site-specific genetic recombination is very common method in phage for exchanging the
genetic material. Unlike general recombination it is guided by a recombination enzyme
that recognizes specific nucleotide sequences present on one or both of the recombining
DNA molecules. Base-pairing between the recombining DNA molecules need not be
involved, and even when it is, the heteroduplex joint that is formed is only a few base
pairs long. By separating and joining double-stranded DNA molecules at specific sites,
this type of recombination enables various types of mobile DNA sequences to move
about within and between chromosomes.
Site-specific recombination was first discovered as the means by which a bacterial virus,
bacteriophage lambda, moves its genome into and out of the E. coli chromosome. In its
integrated state the virus is hidden in the bacterial chromosome and replicated as part of
the host's DNA (Figure 2.8). When the virus enters a cell, a virus-encoded enzyme called
lambda integrase is synthesized. This enzyme catalyzes a recombination process that
begins when several molecules of the integrase protein bind tightly to a specific DNA
sequence on the circular bacteriophage chromosome. The resulting DNA-protein
complex can now bind to a related but different specific DNA sequence on the bacterial
chromosome, bringing the bacterial and bacteriophage chromosomes close together. The
integrase then catalyzes the required DNA cutting and resealing reactions, using a short
region of sequence homology to form a tiny heteroduplex joint at the point of union
(Figure 2.5). The integrase resembles a DNA topoisomerase in that it forms a reversible
covalent linkage to DNA wherever it breaks a DNA chain. The same type of site-specific
recombination mechanism can also be carried out in reverse by the lambda
bacteriophage, enabling it to exit from its integration site in the E. coli chromosome in
order to multiply rapidly within the bacterial cell. This excision reaction is catalyzed by a
complex of the integrase enzyme (Figure 2.6) with a second bacteriophage protein, which
is produced by the virus only when its host cell is stressed. If the sites recognized by such
a recombination enzyme are flipped, the DNA between them will be inverted rather than
excised (Figure 2.7). Many other enzymes that catalyze site-specific recombination
resemble lambda integrase in requiring a short region of identical DNA sequence on the
two regions of DNA helix to be joined.
[7]
Because of this requirement, each enzyme in this
class is fastidious with respect to the DNA sequences
that it recombines, and it can be expected to catalyze
one particular DNA joining event that is useful to the
virus, plasmid, transposable element, or cell that
contains it. These enzymes can be exploited as tools in
transgenic animals to study the influence of specific
genes on cell behavior.
Figure 2.5: The formation of
a cross-strand exchange.
There are many possible
pathways that can lead from
a single-strand exchange to
a cross-strand exchange,
but only one is shown.
Figure 2.6: The insertion of bacteriophage lambda DNA into the
bacterial chromosome. In this example of site-specific
recombination, the lambda integrase enzyme binds to a specific
"attachment site" DNA sequence on each chromosome, where
it makes cuts that bracket a short homologous DNA sequence;
the integrase thereby switches the partner strands and reseals
them so as to form a heteroduplex joint 7 base pairs long. Each
of the four strand-breaking and strand joining reactions required
resembles that made by a DNA topoisomerase, inasmuch as
the energy of a cleaved phosphodiester bond is stored in a
transient covalent linkage between the DNA and the enzyme.
[8]
Figure 2.7: Switching gene expression by DNA inversion in bacteria.
Alternating transcription of two flagellin genes in a Salmonella
bacterium is caused by a simple site-specific recombination event
that inverts a small DNA segment containing a promoter that in one
orientation (A) activates transcription of the H2 flagellin gene as well
as a repressor protein that blocks the expression of the H1 flagellin
gene. When the promoter is inverted, it no longer turns on H2 or the
repressor, and the H1 gene, which is thereby released from
repression, is expressed instead (B). The recombination mechanism
is activated only rarely (about once every 105 cell divisions).
Therefore, the production of one or other flagellin tends to be faithfully
inherited in each clone of cells.
Site-specific recombination enzymes that break and rejoin two DNA double helices at
specific sequences on each DNA molecule often do so in a reversible way: as for lambda
bacteriophage, the same enzyme system that joins two DNA molecules can take them
apart again, precisely restoring the sequences of the two original DNA molecules. This
type of recombination is therefore called conservative site-specific recombination to
distinguish it from the mechanistically.
[9]
Figure 2.8: The life cycle of bacteriophage lambda. The lambda genome contains about 50,000
nucleotide pairs and encodes about 50 proteins. Its double-stranded DNA can exist in either linear or
circular forms. As shown, the bacteriophage can multiply by either a lytic or a lysogenic pathway in the
E. coli bacterium. When the bacteriophage is growing in the lysogenic state, damage to the cell causes
the integrated viral DNA (provirus) to exit from the host chromosome and shift to lytic growth. The
entrance and exit of the DNA from the chromosome are site-specific genetic recombination events
catalyzed by the lambda integrase protein.
2.2.4 Genetic Transformation, Conjugation and Transduction in
Bacteria
Bacteria can exchange or transfer DNA between other bacteria in three different ways. In
every case the source cells of the DNA are called the DONORS and the cells that receive
the DNA are called the RECIPIENTS. In each case the donor DNA is incorporated into
[10]
the recipients cell's DNA by recombination exchange (Figure 2.9). If the exchange
involves an allele of the recipient's gene, the recipient's genome and phenotype will have
changed. The three forms of bacterial DNA exchange are (1) TRANSFORMATION, (2)
CONJUGATION and (3) TRANSDUCTION.
Figure 2.9: General scheme of bacterial
exchange of DNA. DNA from a donor cell is
transferred to a recipient cell where it
undergoes
recombinational
exchange,
replacing one or more of the recipient's genes
with those from the donor.
Figure
2.10:
Representative
FERTILITY PLASMID. A fertility
plasmid carries the genes for
conjugation as well as a number of
other genes. In this figure the
fertility plasmid also carries
antibiotic resistant genes.
[11]
Plasmids - Before DNA
This would be like someone afraid of being
robbed carrying around an AK-47, a rocket
launcher and a small cannon; they might be
safe from thieves, but all that bulk and weight
is going to seriously interfere with their
everyday lives--like getting dates.
exchange can be discussed it
is necessary to understand
what
PLASMIDS
are?
Plasmids are best thought of
as MINI-CHROMOSOMES.
Plasmids are composed of
DNA which usually exists as a CIRCULAR MOLECULE,
only much SMALLER than the genomic DNA (Figure
2.10). Plasmids vary in size, but most are between 1,000
to 25,000 base pairs vs. 4,000,000 bp in the genome.
Plasmids REPLICATE AUTONOMOUSLY from the
genomic chromosome. Often there are MANY
Figure 2.11: Plasmids in a
PLASMID COPIES present in one cell (Figure 2.11). bacterial host cell. A cell may
Further, a cell may contain SEVERAL DIFFERENT contain no plasmids, one
PLASMIDS or it may contain NO PLASMIDS at all. plasmid or many copies of a
Plasmids generally carry genes that are NOT plasmid. A single host may
ESSENTIAL for a cell's survival except under special contain a number of different
plasmids (green, blue & pink).
circumstances. For example, many plasmids carry genes
for ANTIBIOTIC RESISTANCE (Figure 2.13). When these plasmids are present in a
cell, it is unaffected by the appropriate antibiotic, but if the plasmid and its antibiotic
resistant gene is lost, the host cell becomes sensitive to a given antibiotic. Some plasmids
carry resistance genes to several antibiotics, making them very dangerous pathogens. In
other cases plasmids, called VIRULENCE-PLASMIDS, carry VIRULENCE GENES that
enhance a host's ability to cause a disease. That is, a bacterium carrying a plasmid
containing the virulence gene is able to CAUSE A DISEASE (Figure 2.12), but when the
plasmid is missing that same bacterium is unable to produce that disease. One such
plasmid-based disease of recent concern is the strain of E. coli - O157:H7 that produces a
severe food-borne disease. Other plasmids carry genes for protecting a cell against
DELETERIOUS substances like mercury, copper or they may carry genes that make it
possible for a cell to metabolize an UNUSUAL SUBSTRATE, such as gasoline, as a
nutrient or energy source.
The question naturally arises as
Figure 2.13: Selection of
to the PURPOSE of these
Antibiotic-Resistant
plasmids in the evolutionary
Mutants. If an antibioticsensitive
bacterium
is
scheme.
The
current
grown
in
a
culture,
explanation
is
that
occasionally a random
plasmids constitute an
mutation
occurs
that
EXTRA POOL OF
renders
a
bacterium
GENE
resistant to a given
antibiotic. To detect the
ALLELES
presence of such a
and
thus
mutation one plates the
enlarge the
culture on a medium
effective
containing a lethal dose of
gene
the antibiotic in question.
Any cells that grow on the
pool
left plate must be resistant
(red colonies) to the
antibiotic. All the cells
(green
& for
red)plaque
will grow on
Figure 2.12: Preparation of phage
dilution
a medium
lacking
the
formation
antibiotic.
[12]
of the population. Remember that the genome of prokaryotes carries only enough
information for between 1,000 to 5,000 genes. But, as we've already learned, the more
variety the better a species' chances of survival are in a fickle universe. The phenomenon
of ANTIBIOTIC RESISTANCE is a case in point. Antibiotics, being natural products of
certain organisms, are never-the-less unlikely to be encountered very often in quantities
that endanger susceptible sensitive strains, so there is no need to carry resistance genes
against the hundreds of antibiotics that lurk in the nooks 'n crannies of the environment.
Indeed, to do so would likely tie up all your genes just for this one purpose; clearly not a
survival plus.
However, random mutation has produced antibiotic resistance genes that clearly can
prove useful under the RIGHT CIRCUMSTANCES, but how do they remain available,
without tying up huge quantities of LIMITED RESOURCES? The answer is
PLASMIDS, of course (bet you saw that coming didn't you?). A RARE PLASMID,
randomly carrying a RARE ANTIBIOTIC RESISTANT GENE to, for example,
penicillin, happens to be in a patient suffering from an infection (e. g. - clap) which is
treated by a shot in the you-bloody-well-know-where. All the resistant bacteria's mates,
lacking the resistance plasmid, are quickly killed, but the lucky bacterium with its
penicillin-resistant-plasmid survives and reproduces while swimming in a sea of
penicillin. Naturally, all the subsequent daughter cells carry the resistance plasmid,
because if they didn't they'd die very quickly. This is a classical example of SURVIVAL
OF THE FITTEST & of evolution in action.
In the modern world we produce huge
quantities of antibiotics, so the selective
pressure on bacteria containing plasmids
carrying antibiotic resistant genes is intense,
particularly in places like hospitals. As a
consequence of this evolutionary process,
current
antibiotics
are
losing
their
effectiveness. To compound the problem, most
of the plasmids carrying the antibiotic resistant
genes have the ability to move from one
bacteria to another by conjugation. In effect, a
single cell carrying an antibiotic- resistant
plasmid can "INFECT" many other cells with
this plasmid thereby spreading the resistance
plasmid rapidly THROUGHOUT a bacterial
population (sort of like us getting a flu shot).
Figure 2.14: Isolation of CELL-FREE
The survival logic of this ability is obvious, at
or NAKED DNA. The cells are broken
and the DNA released. The cell-free
least as far as the bacteria are concerned.
DNA is subsequently isolated and
collected.
[13]
Plasmids have one other very
significant role to play in this story.
They serve as the VEHICLES for
carrying genes between cells in the
genetic engineering revolution.
Transformation
-
The
discovery of transformation was
previous described. Since its initial
discovery transformation has been
shown to occur throughout the
bacterial world and it has become
the most commonly used artificial
way of moving genes from one
Figure 2.15: Mixing of Donor DNA with
bacterium to another. The basic
Recipient Competent Cells. The naked donor
procedure involves:
DNA is incubated with the competent
Breaking open the donor cells and
recipient cells to which it binds.
removing DNA from them so as to
obtain a CELL-FREE, usually purified, form of DNA (NAKED DNA) (Figure 2.14).
Transformation is used to move DNA between bacteria, plants and animals. In each case
the methods used to get the DNA into the recipient cells are slightly different. In bacteria
COMPETENCY (Figure 2.15) is an empirical matter; that is it can not be predicted what
conditions will produce competency in a given strain of bacteria. However, the following
treatment often induces competency in G- bacteria:
Young cells are incubated with a CALCIUM CHLORIDE SOLUTION for approximately
30 min on ice. In some cases magnesium is
also present.
The cells are concentrated and suspended
as a thick suspension in the calcium
solution. The cells may be mixed with
reagents like glycerol and stored at -80 oC
for later use or they may be used
immediately.
Cell-free DNA is then mixed with these
competent cells (Figure 2.16) on ice for
approximately 30 min followed by a brief
mild heating.
The transformed cells are incubated in a
rich medium for approximately 1 to 1.5 hr.
Figure 2.16: Uptake and Recombination of Donor
and then plated on medium containing
DNA. Donor DNA binds to competent recipient
materials that will detect the presence of the
cells, following which it enters the recipient cells.
transformed genes.
Portions of the donor DNA align, at random, with
A variety of other transformation techniques
genes on the recipient DNA and segments of the
are used for eukaryotic cells. These include
two DNA's are exchanged. The exchange inserts
Donor genes into the recipient cell"s DNA.
mixing certain salts with DNA. These salts
bind the DNA and the salt-DNA-complex is
[14]
then taken into the eukaryotic cells where the DNA is subsequently incorporated into the
recipient cell's DNA. Plant cells are often covered with a thick cell wall that is difficult to
penetrate. To get DNA into these cells tiny metal beads coated with the donor DNA are
"shot" into the cytoplasm of the recipient cells using a "gas gun". A strong jolt of
electricity is also used to drive the DNA into recipient cells. Because of the similar
chemical nature of DNA, DNA from any living form can, in theory, function in any other
life form. Animals or plants that have been transformed with DNA from other species are
called TRANSGENIC organisms. For example, we have transgenic pigs and cows
containing functional "human genes". Transgenic plants containing "bacterial genes" that
make a protein toxic to certain insect pathogens are currently growing around the world.
2.2.5 Genetics of Mitochondria and Chloroplasts
Mendel’s principles of segregation and independent assortment are based on the
assumption that genes are located on chromosomes in the nucleus of the cell. For the
majority of genetic characteristics, this assumption is valid, and Mendel’s principles
allow us to predict the types of offspring that will be produced in a genetic cross.
However, not all the genetic material of a cell is found in the nucleus; some
characteristics are encoded by genes located in the cytoplasm.
These characteristics exhibit cytoplasmic inheritance. A few organelles, notably
chloroplasts and mitochondria, contain DNA. Each human mitochondrion contains about
15,000 nucleotides of DNA, encoding 37 genes. Compared with that of nuclear DNA,
which contains some 3 billion nucleotides encoding perhaps 35,000 genes, the amount of
mitochondrial DNA (mtDNA) is very small; nevertheless, mitochondrial and chloroplast
genes encode some important characteristics.
Cytoplasmic inheritance differs from the inheritance of characteristics encoded by
nuclear genes in several important respects. A zygote inherits nuclear genes from both
parents, but typically all of its cytoplasmic organelles, and thus all its cytoplasmic genes,
come from only one of the gametes, usually the egg. Sperm generally contributes only a
set of nuclear genes from the male parent. In a few organisms, cytoplasmic genes are
inherited from the male parent, or from both parents; however, for most organisms, all
the cytoplasm is inherited from the egg. In this case, cytoplasmically inherited maits are
present in both males and females and are passed from mother to offspring, never from
father to offspring. Reciprocal crosses, therefore, give different results when cytoplasmic
genes encode a trait. Cytoplasmically inherited characteristics frequently exhibit
extensive phenotypic variation, because there is no mechanism analogous to mitosis or
meiosis to ensure that cytoplasmic genes are evenly distributed in cell division. Thus,
different cells and individuals will contain various proportions of cytoplasmic genes.
Consider mitochondrial genes. There are thousands of mitochondria in each cell, and
each mitochondrion contains from 2 to 10 copies of mtDNA. Suppose that half of the
mitochondria in a cell contain a normal wild-type copy of mtDNA and the other half
contain a mutated copy (Figure 2.17). In cell division, the mitochondria segregate into
progeny cells at random. Just by chance, one cell may receive mostly mutated mtDNA
and another cell may receive mostly wild-type mtDNA (see Figure 2.17). In this way,
different progeny from the same mother and even cells within an individual offspring
[15]
may vary in their phenotype e.g. cytoplasmic inheritance like inheritance of plastids in
Mirabilis jalapa.
Traits encoded by chloroplast DNA (cpDNA) are similarly variable. In 1909, cytoplasmic
inheritance was recognized by Carl Correns as one of the first exceptions to Mendel’s
principles. Correns, one of the biologists who rediscovered Mendel’s work, studied the
inheritance of leaf variegation in the four-o’clock plant, Mirabilis jalapa. Correns found
that the leaves and shoots of one variety of four-o’clock were variegated, displaying a
mixture of green and white splotches. He also noted that some branches of the variegated
strain had all-green leaves; other branches had all white leaves.
Each branch produced flowers; so Correns was able to cross flowers from variegated,
green, and white branches in all combinations (Figure 2.18). The seeds from green
branches always gave rise to green progeny, no matter whether the pollen was from a
green, white, or variegated branch. Similarly, flowers on white branches always produced
white progeny. Flowers on the variegated branches gave rise to green, white, and
variegated progeny, in no particular
ratio.
Figure 2.17: Cytoplasmically inherited
characteristics frequently exhibit extensive
phenotypic variation because cells and
individual
offspring
contain
various
proportions
of
cytoplasmic
genes.
Mitochondria that have wild-type mtDNA
are shown in red; those having mutant
mtDNA are shown in blue.
[16]
Corren’s crosses demonstrated cytoplasmic
inheritance of variegation in the fouro’clocks. The phenotypes of the offspring
were determined entirely by the maternal
parent, never by the paternal parent (the
source of the pollen). Furthermore, the
production of all three phenotypes by
flowers on variegated branches is
consistent with the occurrence of
cytoplasmic inheritance. Variegation in
these plants is caused by a defective gene
in the cpDNA, which results in a failure to
produce the green pigment chlorophyll.
Cells from green branches contain normal
chloroplasts only, cells from white
branches contain abnormal chloroplasts
only, and cells from variegated branches
contain a mixture of normal and abnormal
chloroplasts.
In the flowers from variegated branches,
the random segregation of chloroplasts in
the course of oogenesis produces some egg
cells with normal cpDNA, which develop
into green progeny; other egg cells with
only abnormal cpDNA develop into white
progeny; and, finally, still other egg cells
with a mixture of normal and abnormal
cpDNA develop into variegated progeny.
In recent years, a number of human
diseases (mostly rare) that exhibit
cytoplasmic inheritance have been
identified. These disorders arise from
mutations in mtDNA, most of which occur
in genes coding for components of the
electron-transport chain, which generates
most of the ATP (adenosine triphosphate)
in aerobic cellular respiration. One such
disease is Leber hereditary optic neuropathy.
Figure 2.18: Crosses for leaf type in four
Patients who have this disorder
o’clocks illustrate cytoplasmic inheritance.
experience rapid loss of vision in both eyes,
resulting from the death of cells in the optic
nerve. Loss of vision typically occurs in early adulthood (usually between the ages of 20
and 24), but it can occur any time after adolescence. There is much clinical variability in
the severity of the disease, even within the same family.
[17]
Leber hereditary optic neuropathy exhibits maternal inheritance: the trait is always passed
from mother to child.
2.2.6 Cytoplasmic Male Sterility
Background - The first documentation of male sterility came in 1763 when Joseph
Gottlieb Kölreuter observed anther abortion within species and specific hybrids. It is
more prevalent than female sterility, either because the male sporophyte and gametophyte
are less protected from the environment than the ovule and embryo sac, or because it
results from natural selection on mitochondrial genes which are maternally inherited and
are thus not concerned with pollen production. Male sterility is easy to detect because a
large number of pollen are produced and are easily studied. Male sterility is assayed
through staining techniques (carmine, lactophenol or iodine); while detection of female
sterility is detectable by the absence of seeds. Male sterility has propagation potential in
nature since it can still set seed and is important for crop breeding, while female sterility
does not. Male sterility can be aroused spontaneously via mutations in nuclear and/or
cytoplasmic genes.
Among the two types of male sterility, genetic and cytoplasmic, cytoplasmic male
sterility (CMS) is caused by the extranuclear genome (mitochondria or chloroplast) and
show maternal inheritance. Manifestation of male sterility in these may be either entirely
controlled by cytoplamsic factors or by the interaction between cytoplamsic and nuclear
factors.
Explanation of Cytoplasmic male sterility - Cytoplasmic male sterility, as
the name indicates, is under extra nuclear genetic control. They show non-Mendelian
inheritance and are under the regulation of cytoplasmic factors. In this type, male sterility
inherited maternally. This is not a very common type of male sterile system in the plant
kingdom. In general there are two types of cytoplasm viz.., N (normal) and the aberrant S
(sterile) cytoplasms. These types exhibit reciprocal differences.
Cytoplasmic genetic male sterility - When nuclear genes for fertility
restoration (Rf) are available for CMS system in any crop, it is called as Cytoplasmic
Genetic Male Sterility (CGMS). This type of male sterility system is common in many
plant species across plant kingdom. The sterility is manifested by the influence of both
nuclear and cytoplasmic genes. There are commonly two types of cytoplasms, N (normal)
and S (sterile). There are also restorers of fertility (Rf) genes, which are distinct from
genetic male sterility genes. The Rf genes do not have any expression of their own unless
the sterile cytoplasm is present. Rf genes are required to restore fertility in S cytoplasm
which causes sterility. Thus a combination of N cytoplasm with rfrf and S cytoplasm with
Rf- produces fertiles; while S cytoplasm with rfrf produces only male steriles. Another
feature of these systems is that Rf mutations (i.e., mutations to rf or no fertility
restoration) are frequent, so N cytoplasm with Rfrf is best for stable fertility.
Because of the convenience to control the sterility expression by manipulating the gene–
cytoplasm combinations in any selected genotype, cytoplasmic genetic male sterility
systems are widely exploited in crop plants for hybrid breeding. Incorporation of these
[18]
systems for male sterility evades the need for emasculation in cross-pollinated species,
thus encouraging cross breeding producing only hybrid seeds under natural conditions.
Cytoplasmic male sterility in hybrid breeding - Male sterile plants
produce no functional pollen, but do produce viable eggs. Cytoplasmic male sterility is
used in agriculture to facilitate the production of hybrid seed. Hybrid seed is produced
from a cross between two genetically different lines; such seeds usually result in larger,
more vigorous plants. The main practical problem in producing hybrid seed is to prevent
self-pollination, which would produce seeds that are not hybrid. One breeding scheme is
illustrated in Figure 2.19.
Hybrid production requires a female plant in which no viable male gametes are borne.
Emasculation is done to make a plant devoid of pollen so that it is made female. Another
simple way to establish a female line for hybrid seed production is to identify or create a
line that is unable to produce viable pollen. This male sterile line is therefore unable to
self-pollinate, and seed formation is dependent upon pollen from the male line.
Cytoplasmic male sterility is used in hybrid seed production. In this case, the sterility is
transmitted only through the female and all progeny will be sterile. This is not a problem
for crops such as onions or carrots where the commodity harvested from the F1
generation is produced during vegetative growth. These CMS lines must be maintained
by repeated crossing to a sister line (known as the maintainer line) that is genetically
identical except that it possesses normal cytoplasm and is therefore male fertile. In genic
cytoplasmic male sterility restoration of fertility is done using restorer lines carrying
nuclear restorer genes in crops. The male sterile line is maintained by crossing with a
maintainer line which has the same genome as that of the MS line but carrying normal
fertile cytoplasm.
[19]
Figure 2.19: The use
of cytoplasmic male
sterility to facilitate the
production of hybrid
corn. In this scheme,
the hybrid corn is
generated from four
pure parental lines: A,
B, C, and D. Such
hybrids are called
double-cross hybrids.
At
each
step,
appropriate
combinations
of
cytoplasmic genes and
nuclear restorer genes
ensure that the female
parents will not self
and that male parents
will have fertile pollen.
(After J. Janick et al.,
Plant
Science.
Copyright © 1974 by
W. H. Freeman and
Company.)
[20]
2.3 GENE STRUCTURE AND EXPRESSION
2.3.1 Genetic Fine Structure or Fine Structure of Gene
A gene (Figure 2.20) is a
locatable region of genomic
sequence, corresponding to a unit
of
inheritance,
which
is
associated
with
regulatory
regions, transcribed regions
and/or other functional sequence
regions.
The
physical
development and phenotype of
organisms can be thought of as a
product of genes interacting with
each other and with the
environment.
A
concise
definition of a gene, taking into
account complex patterns of
regulation and transcription,
genic conservation and nonFigure 2.20: This stylistic diagram shows a gene in relation
coding RNA genes, has been
to the double helix structure of DNA and to a chromosome
(right). Introns are regions often found in eukaryote genes
proposed by Gerstein et al. "A
that are removed in the splicing process (after the DNA is
gene is a union of genomic
transcribed into RNA): only the exons encode the protein.
sequences encoding a coherent
This diagram labels a region of only 40 or so bases as a
set of potentially overlapping
gene. In reality most genes are hundreds of times larger.
functional products".
Colloquially, the term gene is often used to refer to an inheritable trait which is usually
accompanied by a phenotype as in ("tall genes" or "bad genes") -- the proper scientific
term for this is allele.
In cells, genes consist of a long strand of DNA that contains a promoter, which
controls the activity of a gene, and coding and non-coding sequence. Coding sequence
determines what the gene produces, while non-coding sequence can regulate the
conditions of gene expression. When a gene is active, the coding and non-coding
sequence is copied in a process called transcription, producing an RNA copy of the gene's
information. This RNA can then direct the synthesis of proteins via the genetic code. But
some RNAs are used directly, for example as part of the ribosome. These molecules
resulting from gene expression, whether RNA or protein, are known as gene products.
Genes often contain regions that do not encode products, but regulate gene
expression. The genes of eukaryotic organisms can contain regions called introns that are
removed from the messenger RNA in a process called splicing. The regions encoding
gene products are called exons. In eukaryotes, a single gene can encode multiple proteins,
which are produced through the creation of different arrangements of exons through
alternative splicing. In prokaryotes (bacteria and archaea), introns are less common and
genes often contain a single uninterrupted stretch of DNA, called a cistron, that codes for
a product. Prokaryotic genes are often arranged in groups called operons with promoter
[21]
and operator sequences that regulate transcription of a single long RNA. This RNA
contains multiple coding sequences. Each coding sequence is preceded by a ShineDalgarno sequence that ribosomes recognize.
The total set of genes in an organism is known as its genome. An organism's genome size
is generally lower in prokaryotes, both in number of base pairs and number of genes, than
even single-celled eukaryotes. However, there is no clear relationship between genome
sizes and complexity in eukaryotic organisms. One of the largest known genomes
belongs to the single-celled amoeba Amoeba dubia, with over 670 billion base pairs,
some 200 times larger than the human genome. The estimated number of genes in the
human genome has been repeatedly revised downward since the completion of the
Human Genome Project; current estimates place the human genome at just under 3
billion base pairs and about 20,000–25,000 genes. A recent Science article gives a
number of 20,488 protein-coding genes, with perhaps 100 more yet to be discovered. The
gene density of a genome is a measure of the number of genes per million base pairs
(called a Megabase, Mb); prokaryotic genomes have much higher gene densities than
eukaryotes. The gene density of the human genome is roughly 12–15 genes per megabase
pair.
History



The existence of genes was first
suggested by Gregor Mendel (18221884), who, in the 1860s, studied
inheritance in pea plants.
Mendel's concept was given a name by Hugo de
Vries in 1889, who, at that time probably
unaware of Mendel's work, in his book
Intracellular Pangenesis coined the term
"pangen" for "the smallest particle
[representing] one hereditary characteristic".
Wilhelm Johannsen abbreviated this term to "gene"
("gen" in Danish and German) two decades later.
Physical Structure - The vast majority of living
organisms encode their genes in long strands of DNA. DNA
consists of a chain made from four types of
Figure: 2.21: The chemical structure of a
nucleotide subunits: adenine, cytosine, guanine, and
four-base fragment of a DNA double helix.
thymine (Figure 2.21). Each nucleotide subunit
consists of three components: a phosphate group, a
deoxyribose sugar ring, and a nucleobase. Thus, nucleotides in DNA or RNA are
typically called 'bases'; consequently they are commonly referred to simply by their
purine or pyrimidine original base components adenine, cytosine, guanine, thymine.
Adenine and guanine are purines and cytosine and thymine are pyrimidines. The most
common form of DNA in a cell is in a double helix structure, in which two individual
DNA strands twist around each other in a right-handed spiral. In this structure, the base
pairing rules specify that guanine pairs with cytosine and adenine pairs with thymine
(each pair contains one purine and one pyrimidine). The base pairing between guanine
[22]
and cytosine forms three hydrogen bonds, while the base pairing between adenine and
thymine forms two hydrogen bonds. The two strands in a double helix must therefore be
complementary, that is, their bases must align such that the adenines of one strand are
paired with the thymines of the other strand, and so on.
Due to the chemical composition of the pentose residues of the bases, DNA strands have
directionality. One end of a DNA polymer contains an exposed hydroxyl group on the
deoxyribose, this is known as the 3' end of the molecule. The other end contains an
exposed phosphate group, this is the 5' end. The directionality of DNA is vitally
important to many cellular processes, since double helices are necessarily directional (a
strand running 5'-3' pairs with a complementary strand running 3'-5') and processes such
as DNA replication occur in only one direction. All nucleic acid synthesis in a cell occurs
in the 5'-3' direction, because new monomers are added via a dehydration reaction that
uses the exposed 3' hydroxyl as a nucleophile.
The expression of genes encoded in DNA begins by transcribing the gene into RNA, a
second type of nucleic acid that is very similar to DNA, but whose monomers contain the
sugar ribose rather than deoxyribose. RNA also contains the base uracil in place of
thymine. RNA molecules are less stable than DNA and are typically single-stranded.
Genes that encode proteins are composed of a series of three-nucleotide sequences called
codons, which serve as the "words" in the genetic "language". The genetic code specifies
the correspondence during protein translation between codons and amino acids. The
genetic code is nearly the same for all known organisms.
RNA genes - In some cases, RNA is an intermediate product in the process of
manufacturing proteins from genes. However, for other gene sequences, the RNA
molecules are the actual functional products. For example, RNAs known as ribozymes
are capable of enzymatic function, and miRNAs have a regulatory role. The DNA
sequences from which such RNAs are transcribed are known as RNA genes.
Some viruses store their entire genomes in the form of RNA, and contain no DNA at all.
Because they use RNA to store genes, their cellular hosts may synthesize their proteins as
soon as they are infected and without the delay in waiting for transcription. On the other
hand, RNA retroviruses, such as HIV, require the reverse transcription of their genome
from RNA into DNA before their proteins can be synthesized. In 2006, French
researchers came across a puzzling example of RNA-mediated inheritance in mouse.
Mice with a loss-of-function mutation in the gene Kit have white tails. Offspring of these
mutants can have white tails despite having only normal Kit genes. The research team
traced this effect back to mutated Kit RNA. While RNA is common as genetic storage
material in viruses, in mammals in particular RNA inheritance has been observed very
rarely.
Functional structure of a gene - All genes have regulatory regions in addition
to regions that explicitly code for a protein or RNA product. A regulatory region shared
by almost all genes is known as the promoter (Figure 2.22), which provides a position
that is recognized by the transcription machinery when a gene is about to be transcribed
and expressed.
[23]
Although promoter
regions have a
consensus sequence
that is the most
common sequence
at this position,
some genes have
"strong" promoters
that
bind
the
transcription machinery well, and
others have "weak"
promoters that bind
poorly. These weak
promoters usually
permit a lower rate
of transcription than
the strong promoteFigure 2.22: Diagram of the "typical" eukaryotic protein-coding gene.
rs, because the
Promoters and enhancers determine what portions of the DNA will be
transcription machtranscribed into the precursor mRNA (pre-mRNA). The pre-mRNA is then
inery binds to them
spliced into messenger RNA (mRNA) which is later translated into protein.
and initiates transcripttion less frequently. Other possible regulatory regions include enhancers, which can
compensate for a weak promoter. Most regulatory regions are "upstream" - that is, before
or toward the 5' end of the transcription initiation site. Eukaryotic promoter regions are
much more complex and difficult to identify than prokaryotic promoters.
Many prokaryotic genes are organized into operons, or groups of genes whose products
have related functions and which are transcribed as a unit. By contrast, eukaryotic genes
are transcribed only one at a time, but may include long stretches of DNA called introns
which are transcribed but never translated into protein (they are spliced out before
translation). Splicing can also occur in prokaryotic genes, but is less common than in
eukaryotes.
2.3.2 Cis-Trans Test: Complementation
This was studied in Bacteriophage Genetics, where mutation of a gene designated r, for
"rapid lysis was examined." It turned out that actually there are three different gene loci rI, rII, and rIII - mutations in any one of which produced a rapid-lysis phenotype. But, in
addition, there were many mutations found in each of these. Could wild-type virus be
formed by recombination between mutations within the same gene? Seymour Benzer
decided to find out. In Bacteriophage Genetics, the recombination frequency between
different genes is low (on the order of 10-2). One would expect that recombination
frequencies between mutations in a single gene would be far lower (10-4 or less).
[24]
Figure 2.23: Strain B infected by 2 different
phage (rx & ry) and inoculated on lawn B
(permissive) and lawn K (non-permissive)
separately. Lawn B grown entirely i.e. “Total”
but Lawn K grown limited i.e. “Restrictive”.
Fortunately Benzer could exploit a phenomenon to enable him to detect such rare events:
rII mutants can infect - but not complete their life cycle in - a strain of E. coli designated
K. Wild-type T4 can complete its life cycle in both strains.
The procedure was to infect strain B (Figure 2.23) in liquid culture with two mutants to
be tested (designated here as rx and ry). After incubation, these were plated on a lawn of:
 strain B — which supports the growth of all viruses thus giving the total number
of viruses liberated.
 strain K — on which only wild-type viruses can grow (Figure 2.24).
The recombination frequency between any pair of mutations is calculated as
Recombination Frequency = 2 × number of wild-type plaques (strain K plaques) ÷ total
number of plaques (on strain B).
You have to double the number found on strain K because you only see one-half the
recombinants — the other half consists of double mutants. Using this technique, Benzer
eventually found some 2000 different mutations in the rII gene. The recombination
frequency between some pairs of these was as low as 0.02.
 The T4 genome has 160,000 base pairs of DNA extending over ~1,600
centimorgans (cM).
 So 1 cM = 100 base pairs
 So 0.02 cM represents a pair of adjacent nucleotides.
 From these data, Benzer concluded that the
o Smallest unit of mutation and
o The smallest unit of recombination was a single base pair of DNA.
In other words,
 These mutations represent a change in a single base pair - we call these point
mutations.
 Recombination between two molecules of DNA can occur at any pair of
nucleotides.
As we saw above, rapid lysis (r) mutants were found that mapped to three different
regions of the T4 genome: rI, rII, and rIII.
This meant that
 Those in different regions were not alleles of the same gene.
 More than one gene product participated in the lysis function.
Even within one "locus", rII, there turned out to be two different stretches of DNA both
of which were needed intact for the lysis function. This was revealed by the
complementation test that Benzer used. In this test,
 E. coli strain K (which rII mutants can infect but not complete their life cycle) growing in liquid culture - was
 Co-infected with two different rII mutants (here shown as "1" and "2").
[25]
Note that this procedure differs from the earlier one (recombination) in that the
nonpermissive E. coli K is used for the initial infection (not strain B as before). Neither
strain rII"1" nor strain rII"2" is able to grown in E. coli K. But if the lost function in
rII"1" is NOT the same as the lost function in rII"2", then
 each should be able to produce the gene product missing in the other complementation - and
 living phages will be produced. (Again, there is no need to count plaques; simply
see if they are formed or not.)
Mutant
strains
1
2
3
4
5
1
2
3
4
5
0
0
+
0
+
0
+
0
+
0
+
0
0
+
Figure
2.24:
Strain K (Nonpermissive)
infected
by
phage 1 & 2
inoculated
on
the
lawn
of
strain
B
(permissive).
0
From these results, you can deduce that these 5 rII mutants fall into two different
complementation groups, which Benzer designated
 A (containing strains 1, 2, and 4) and
 B (containing strains 3 and 5)
Later work showed that the function of rII depended on the polypeptide products encoded
by two adjacent regions (A and B) of rII (perhaps acting as a heterodimer). In terms of
function, then, both A and B qualify as independent genes. In co-infections by two
mutant strains,
 If either A or B is mutated on the same DNA molecule ("cis"), there is no
function while
 If A is mutated in one DNA molecule and B in the other ("trans"), function is
restored.
Complementation, then, is the ability of two different mutations to restore wild-type
function when
 They are in the "trans" (on different DNA molecules)
 But not when they are in "cis" (on the same DNA molecule).
Benzer coined the term cistron for these genetic units of function. But today, we simply
modify earlier concepts of the "gene" to fit this operational definition.
2.3.3 The Structure Analysis of Eukaryotes Introns and their
Significance
Introns, derived from the term "intragenic regions", are non-coding sections of precursor
mRNA (pre-mRNA) or other RNAs, that are removed (spliced out of the RNA) before
the mature RNA is formed. Once the introns have been spliced out of a pre-mRNA, the
resulting mRNA sequence, composed of exons, is ready to be translated into a protein.
The corresponding parts of a gene are known as introns as well.
[26]
Introduction - Introns are common in eukaryotic pre-mRNA, but in prokaryotes they
are only found in tRNA and rRNA. Introns, which are non-coding sections of a gene that
are removed, are the opposite of exons which remain in the mRNA sequence after
processing.
The number and length of introns varies widely among species, and among genes within
the same species. Genes of higher organisms, such as mammals and flowering plants,
have numerous introns, which can be much longer than the nearby exons. Some less
advanced organisms, such as fungus Saccharomyces cerevisiae, and protists, have very
few introns. In humans, the gene with the greatest number of introns is the gene for the
protein Titin, with 362 introns.
Figure 2.23 is showing simple illustration of a pre-mRNA, with introns (top), after the
introns have been removed via splicing, the mature mRNA sequence is ready for
translation (bottom).
Introns sometimes allow for alternative splicing of a gene, so that several different
proteins which share some sequences in common can be translated from a single gene.
The control of mRNA splicing is performed by a wide variety of signaling molecules.
Introns may also contain "old code", or sections of a gene that were once translated into a
protein, but have since been discarded. It was generally assumed that the sequence of any
given intron is junk DNA with no function. More recently, however, this is being
disputed.
Introns contain several short sequences that are important for efficient splicing. The exact
mechanism for these intronic splicing enhancers is not well understood, but it is thought
that they serve as binding sites on the transcript for proteins which stabilize the
spliceosome. It is also possible that RNA secondary structure formed by intronic
sequences may have an effect on splicing.
Discovery - The discovery of introns led to the Nobel Prize in Physiology or
Medicine in 1993 for Phillip Allen Sharp and Richard J. Roberts. The term intron was
introduced by American biochemist Walter Gilbert:
"The notion of the cistron [...] must be replaced by that of a transcription unit containing
regions which will be lost from the mature messenger - which I suggest we call introns
(for intragenic regions) - alternating with regions which will be expressed - exons."
(Gilbert 1978)
Classification of Introns - Some introns, such as Group I and Group II introns,
are actually ribozymes that are capable of catalyzing their own splicing out of a primary
RNA transcript. This self splicing activity was discovered by Thomas Cech, who shared
the 1989 Nobel Prize in Chemistry with Sidney Altman for the discovery of the catalytic
properties of RNA.
Four classes of introns are known to exist:
 Group I intron
 Group II intron
 Group III intron
 Nuclear introns
Sometimes group III introns are also identified as group II introns, because of their
similarity in structure and function.
[27]
Nuclear or spliceosomal introns are spliced by the spliceosome and a series of snRNAs
(small nuclear RNAs). There are certain splice signals (or consensus sequences) which
abet the splicing (or identification) of these introns by the spliceosome.
Group I, II and III introns are self splicing introns and are relatively rare compared to
spliceosomal introns. Group II and III introns are similar and have a conserved secondary
structure. The lariat pathway is used in their splicing. They perform functions similar to
the spliceosome and may be evolutionarily related to it. Group I introns are the only class
of introns whose splicing requires a free guanine nucleoside. They possess a secondary
structure different from that of group II and III introns. Many self-splicing introns code
for maturases that help with the splicing process, generally only the splicing of the intron
that encodes it.
Intron evolution - There are two competing theories that offer alternative scenarios
for the origin and early evolution of spliceosomal introns (Other classes of introns such as
self-splicing and tRNA introns are not subject to much debate, but see for the former).
These are popularly called as the Introns-Early (IE) or the Introns-Late (IL) views.
The IE model, championed by Walter Gilbert, proposes that introns are extremely old and
numerously present in the earliest ancestors of prokaryotes and eukaryotes (the
progenote). In this model introns were subsequently lost from prokaryotic organisms,
allowing them to attain growth efficiency. A central prediction of this theory is that the
early introns were mediators that facilitated the recombination of exons that represented
the protein domains. Such a model would directly lead to the evolution of new genes.
Unfortunately, the model cannot account for the variations in the positions of shared
introns between different species.
The IL model proposes that introns were more recently inserted into original intron-less
contiguous genes after the divergence of eukaryotes and prokaryotes. In this model,
introns probably had their origin in parasitic transposable elements. This model is based
on the observation that the spliceosomal introns are restricted to eukaryotes alone.
However, there is considerable debate on the presence of introns in the early prokaryoteeukaryote ancestors and the subsequent intron loss-gain during eukaryotic evolution. It is
also suggested that the evolution of introns and more generally the intron-exon structure
is largely independent of the coding-sequence evolution.
Identification
Nearly all eukaryotic nuclear introns begin with the nucleotide sequence GU, and end
with AG (the GU-AG rule). These, along with a larger consensus sequence, help direct
the splicing machinery to the proper intronic donor and acceptor sites. This mainly occurs
in eukaryotic primary mRNA transcripts.
2.3.4 RNA Splicing
The other major type of
modification that takes place
in eukaryotic pre-mRNA is
the removal of introns by
RNA splicing. This occurs
Conclusion: Critical consensus
sequences are present at the 5’
splice site, the branch point, and the
3’ splice site.
Figure 2.25: Splicing of pre-mRNA requires consensus sequences.
In the consensus sequence surrounding the branch point
(YNYYRAY) Y is any pyrimidine, R is any purine, A is adenine, and
N is any base.
[28]
in the nucleus following transcription but before the RNA moves to the cytoplasm.
Consensus sequences and the spliceosome Splicing requires the presence of three
sequences in the intron. One end of the intron is referred to as the 5' splice site, and the
other end is the 3' splice site (Figure 2.25); these splice sites possess short consensus
sequences. Most introns in pre-mRNA begin with GU and end with AG, suggesting that
these sequences play a crucial role in splicing. Changing a single nucleotide at either of
these sites does indeed prevent splicing. A few introns in pre-mRNA begin with AU and
end with AC. These introns are spliced by a process that is similar to that seen in GU. . .
AG introns but utilizes a different set of splicing factors. This discussion will focus on
splicing of the more common GU. . . AG introns.
The
third
sequence
Table 2.1: RNA–RNA interactions in pre-mRNA splicing
important for splicing is
at the so-called branch
Interaction
Function
point, which is an adenine
U1 with 5' splice U1 attaches to 5' end of intron;
nucleotide that lies from
site
commits intron to splicing; no direct
18 to 40 nucleotides
role in splicing
upstream of the 3' splice
U2 with branch Positions 5' end of intron near branch
site (Figure 2.25). The
point
point for lariat formation
sequence surrounding the
U2 with U6
Holds 5' end of intron near branch
branch point does not
point
have a strong consensus
U6 with 5' splice Positions 5' end of intron near branch but usually takes the form
site
point
YNYYRAY (Y is any
U5 with 3' end of Anchors first exon to spliceosome pyrimidine, N is any base,
first exon
subsequent to cleavage; juxtaposes R is any purine, and A is
two ends of exon for splicing
adenine). The deletion or
U5 with 3' end of Juxtaposes two ends of exon for mutation of the adenine
one exon and 5' splicing
nucleotide at the branch
end of the other
point prevents splicing.
U4 with U6
Delivers U6 to intron; no direct role in Splicing
takes
place
splicing
within a large complex
called the spliceosome, which consists of several RNA molecules and many proteins. The
RNA components are small nuclear RNAs; these snRNAs associate with proteins to form
small ribonucleoprotein particles. Each snRNP contains a single snRNA molecule and
multiple proteins. The spliceosome is composed of five snRNPs, named for the snRNAs
that they contain (U1, U2, U4, U5, and U6), and some proteins not associated with an
snRNA. The process of splicing To illustrate the process of RNA splicing, we’ll first
consider the chemical reactions that take place. Then we’ll see how these splicing
reactions constitute a set of coordinated processes within the context of the spliceosome.
Before splicing takes place, an upstream exon (exon 1) and a downstream exon (exon 2)
are separated by an intron (Figure 2.26). Pre-mRNA is spliced in two distinct steps.
In the first step, the pre-mRNA is cut at the 5' splice site. This cut frees exon 1 from the
intron, and the 5' end of the intron attaches to the branch point; that is, the intron folds
back on itself, forming a structure called a lariat. The guanine nucleotide in the consensus
sequence at the 5' splice site, bonds with the adenine nucleotide, at the branch point. This
bonding is accomplished through transesterification, a chemical reaction in which the OH
[29]
Figure 2.26: The splicing of nuclear introns requires a two-step process. First, cleavage takes
place at the 5' splice site, and a lariat is formed by the attachment of the 5' end of the intron to
the branch point. Second cleavage takes place at the 3' splice site, and two exons are spliced
together.
group on the 2'-carbon atom of the adenine nucleotide at the branch point attacks the 5'
phosphodiester bond of the guanine nucleotide at the 5' splice site, cleaving it and
forming a new 5'–2' phosphodiester bond between the guanine and adenine nucleotides.
In the second step of RNA splicing, a cut is made at the 3' splice site and, simultaneously,
the 3' end of exon 1 becomes covalently attached (spliced) to the 5' end of exon 2. This
bond also forms through a transesterification reaction, in which the 3'-OH group attached
to the end of exon 1 attacks the phosphodiester bond at the 3' splice site, cleaving it and
forming a new phosphodiester bond between the 3' end of exon 1 and the 5' end of exon
2; the intron is released as a lariat. The intron becomes linear when the bond breaks at the
branch point and is then rapidly degraded by nuclear enzymes. The mature mRNA
consisting of the exons spliced together is exported to the cytoplasm where it is
translated.
[30]
Figure 2.28: Intron removal, processing, and
transcription take place at the same site. RNA
tracks can be seen in the nucleus of a
eukaryotic cell. Fluorescent tags were attached
to DNA (red) and RNA (green). Transcribed
RNA does not disperse; rather, it accumulates
near the site of synthesis and follows a defined
track during processing.
Figure 2.27: RNA splicing takes place within the
spliceosome.
[31]
Although splicing is illustrated in Figure 2.27 as a two-step process, the reactions are in
fact coordinated within the spliceosome. A key feature of the spliceosome is a series of
interactions between the mRNA and snRNAs and between different snRNAs
(summarized in Table 2.1). These interactions depend on complementary base pairing
between the different RNA molecules and bring the essential components of the premRNA transcript and the spliceosome close together, which makes splicing possible.
The spliceosome is assembled on the pre-mRNA transcript in a step-by-step fashion
(Figure 2.27). First, snRNP U1 attaches to the 5' splice site, and then U2 attaches to the
branch point. A complex consisting of U5 and U4–U6 (which form a single snRNP) joins
the spliceosome.
At this point, the intron loops over and the 5' splice site is brought close to the branch
point. U1 and U4 disassociate from the spliceosome. The 5' splice site, 3' splice site, and
branch point are in close proximity, held together by the spliceosome. The two
transesterification reactions take place, joining the two exons together and releasing the
intron as a lariat. An animation of the splicing process nuclear organization RNA splicing
takes place in the nucleus and must occur before the RNA can move into the cytoplasm.
For many years, the nucleus was viewed as a biochemical soup, in which components
such
as
the
spliceosome
diffused
and
reacted
randomly.
Figure 2.28: Group I introns undergo self-splicing. (a)
Secondary structure of a group I intron. (b) Self-splicing of
a group I intron.
[32]
Figure 2.29: Group II introns undergo
self-splicing by a different mechanism
from that for group I introns.
(a) Secondary structure of a group II
intron.
(b) Self-splicing of group II introns, which
is similar to the splicing of nuclear
introns.
Conclusion: Both
alternate splicing and
multiple 3‘ cleavage sites
produce different mRNAs
from a single pre-mRNA.
Figure 2.30: Eukaryotic cells have alternative pathways for processing pre-mRNA. (a) With alternative
splicing; pre-mRNA can be spliced in different ways to produce different mRNAs. (b) With multiple 39
cleavage sites, there are two or more potential sites for cleavage and polyadenylation; use of the
different sites produces mRNAs of different lengths.
[33]
Now, the nucleus is believed to have a highly ordered internal structure, with
transcription and RNA processing taking place at particular locations within it. By
attaching fluorescent tags to pre-mRNA and using special imaging techniques,
researchers have been able to observe the location of pre-mRNA as it is transcribed and
processed. The results of these studies revealed that intron removal and other processing
reactions take place at the same sites as those of transcription (Figure 2.28), suggesting
that these processes may be physically coupled. This suggestion is supported by the
observation that part of RNA polymerase II is also required for the splicing and 3'
processing of pre-mRNA.
Self-Splicing Introns - Some introns are self-splicing, meaning that they possess
the ability to remove themselves from an RNA molecule. These self-splicing introns fall
into two major categories. Group I introns are found in a variety of genes, including some
rRNA genes in protists, some mitochondrial genes in fungi, and even some bacteriophage
genes. Although the lengths of group I introns vary, all of them fold into a common
secondary structure with nine looped stems (Figure 2.28), which are necessary for
splicing. Transesterification reactions are required for the splicing of group I introns
(Figure 2.29).
Alternative Processing Pathways - Another finding that complicates the
view of a gene as a sequence of nucleotides that specifies the amino acid sequence of a
protein is the existence of alternative processing pathways, in which a single pre-mRNA
is processed in different ways to produce alternative types of mRNA, resulting in the
production of different proteins from the same DNA sequence. One type of alternative
processing is alternative splicing, in which the same pre-mRNA can be spliced in more
than one way to yield multiple mRNAs that are translated into proteins with different
amino acid sequences (Figure 2.30a).
Another type of alternative processing requires the use of multiple 3' cleavage sites
(Figure 2.30b); two or more potential sites for cleavage and polyadenylation are present
in the pre-mRNA. In our example, cleavage at the first site produces a relatively short
mRNA, compared with the mRNAs produced through cleavage at other sites. Both
alternative splicing and multiple 3' cleavage sites can exist in the same pre-mRNA
transcript; an example is seen in the mammalian calcitonin gene, which contains six
exons and five introns (Figure 2.31a). The entire gene is transcribed into pre-mRNA
(Figure 2.31b). There are two possible 3' cleavage sites. In cells of the thyroid gland, 3'
cleavage and polyadenylation take place after the fourth exon, and the first three introns
are then removed to produce a mature mRNA consisting of exons 1, 2, 3, and 4 (Figure
2.31c). This mRNA is translated into the hormone calcitonin. In brain cells, the identical
pre-RNA is transcribed from DNA, but it is processed differently. Cleavage and
polyadenylation take place after the sixth exon, yielding an initial transcript that includes
all six exons. During splicing, exon 4 (part of the calcitonin mRNA) is removed, along
with all the introns; so only exons 1, 2, 3, 5, and 6 are present in the mature mRNA
(Figure 2.31d). When translated, this mRNA produces a protein called calcitonin-generelated peptide (CGRP), which has an amino acid sequence quite different from that of
calcitonin.
[34]
Figure 2.31: Pre-mRNA encoded by the calcitonin gene undergoes alternative processing.
Alternative splicing may produce
different combinations of exons in the
mRNA, but the order of the exons is not
usually changed. Different processing
pathways contribute to gene regulation.
2.3.5 Regulation of Gene
Expression in Prokaryotes
and Eukaryotes
While the period from 1900 to the
Second World War has been called the
"golden age of genetics", we may be in a
new golden (or platinum) age.
Recombinant DNA technology allows us
to manipulate the very DNA of living
Figure 2.32: Partial gene map of the operons, such as trp and lac, on a bacterial chromosome.
Image from Purves et al., Life: The Science of Biology, 4th Edition, by Sinauer Associates
(http://www.sinauer.com/) and WH Freeman (http://www.whfreeman.com/), used with permission.
[35]
organisms and to make conscious changes in that DNA. Prokaryote genetic systems are
much easier to study and better understood than are eukaryote systems.
Gene Regulation in Prokaryotes
(1) In Bacteria - The single chromosome of the common intestinal bacterium E. coli is
circular and contains some 4.7 million base pairs. It is nearly 1 mm long, but only 2nm
wide (Figure 2.32). The chromosome replicates in a bidirectional method, producing a
figure resembling the Greek letter theta. The promoter is the part of the DNA to which
the RNA polymerase binds before opening the segment of the DNA to be transcribed.
A segment of the DNA that codes for a specific polypeptide is known as a structural
gene. These often occur together on a bacterial chromosome. The location of the
polypeptides, which may be enzymes involved in a biochemical pathway, for example,
allows for quick, efficient transcription of the mRNAs. Often leader and trailer
sequences, which are not translated, occur at the beginning and end of the region. E. coli
can synthesize 1700 enzymes. Therefore, this small bacterium has the genes for 1700
different mRNAs.
Lactose, milk sugar, is split by the enzyme β-galactosidase. This enzyme is inducible,
since it occurs in large quantities only when lactose, the substrate on which it operates, is
present. Conversely, the enzymes for the amino acid tryptophan are produced
continuously in growing cells unless tryptophan is present. If tryptophan is present the
production of tryptophan-synthesizing enzymes is repressed.
The Operon Model - The operon model (Figure 2.33) of prokaryotic gene regulation
was proposed by Fancois Jacob and Jacques Monod. Groups of genes coding for
related proteins are arranged in units known as operons. An operon consists of an
operator, promoter, regulator, and structural genes. The regulator gene codes for a
repressor protein that
binds to the operator,
obstructing
the
promoter
(thus,
transcription) of the
structural genes. The
regulator does not
have to be adjacent to
other genes in the
operon.
If
the
repressor protein is
removed, transcription
may occur.
Figure 2.33 : The lactose operon
[36]
Operons are either inducible or repressible according to the control mechanism. Seventyfive different operons controlling 250 structural genes have been identified for E. coli.
Both repression and induction
are examples of negative
control since the repressor
proteins turn off transcription.
Bacteria do not make all the
proteins that they are capable
of making all of the time.
Rather, they can adapt to their
environment and make only
those gene products that are
essential for them to survive in
a particular environment. For
example, bacteria do not
synthesize the enzymes needed
to make tryptophan when there
is an abundant supply of
tryptophan in the environment.
However, when tryptophan is
absent from the environment
the enzymes are made.
Similarly, just because a
bacterium has a gene for
resistance to an antibiotic does
not mean that that gene will be Figure 2.34 : Transcription of lac genes in the presence and
expressed. The resistance gene absence of glucose
may only be expressed when
the antibiotic is present in the environment.
Bacteria usually control gene expression by regulating the level of mRNA transcription.
In bacteria, genes with related function are generally located adjacent to each other and
they are regulated coordinately (i.e. when one is expressed, they all are expressed).
Coordinate regulation of clustered genes is accomplished by regulating the production of
a polycistronic mRNA (i.e. a large mRNA containing the information for several genes).
Thus, bacteria are able to "sense" their environment and express the appropriate set of
genes needed for that environment by regulating transcription of those genes.
(A). INDUCIBLE GENES - THE OPERON MODEL
1. Definition
An inducible gene is a gene that is expressed in the presence of a substance (an inducer)
in the environment. This substance can control the expression of one or more genes
(structural genes) involved in the metabolism of that substance. For example, lactose
induces the expression of the lac genes that are involved in lactose metabolism. An
[37]
certain antibiotic may induce the expression of a gene that leads to resistance to that
antibiotic.
Induction is common in metabolic pathways that result in the catabolism of a substance
and the inducer is normally the substrate for the pathway.
2. Lactose Operon
a. Structural genes - The lactose operon (Figure 2.33) contains three structural genes
that code for enzymes involved in lactose metabolism.
 The lac z gene codes for β-galactosidase, an enzyme that breaks down lactose into
glucose and galactose
 The lac y gene codes for a permease, which is involved in uptake of lactose
 The lac a gene codes for a galactose transacetylase.
These genes are transcribed from a common promoter into a polycistronic mRNA, which
is translated to yield the three enzymes.
b. Regulatory gene - The expression of the structural genes is not only influenced by
the presence or absence of the inducer, it is also controlled by a specific regulatory gene.
The regulatory gene may be next to or far from the genes that are being regulated. The
regulatory gene codes for a specific protein product called a REPRESSOR.
c. Operator - The repressor acts by binding to a specific region of the DNA called the
operator which is
adjacent
to
the
structural genes being
regulated.
The
structural
genes
together with the
operator region and
the promoter is called
an
OPERON.
However, the binding
of the repressor to the
operator is prevented
by the inducer and the
inducer can also
remove repressor that
has already bound to
the operator. Thus, in
the presence of the
inducer the repressor
is inactive and does
not bind to the
operator, resulting in
transcription of the
structural genes. In
Figure 2.35 : Effect of glucose on expression of proteins encoded by
the lac operon
[38]
contrast, in the absence of inducer the repressor is active and binds to the operator,
resulting in inhibition of transcription of the structural genes. This kind of control is
referred to a NEGATIVE CONTROL since the function of the regulatory gene product
(repressor) is to turn off transcription of the structural genes.
d. Inducer - Transcription of the lac genes is influenced by the presence or absence of
an inducer (lactose or other β-galactosides) (Figure 2.34).
e.g:- + inducer = expression and - inducer = no expression
3. Catabolite repression (Glucose Effect)
Many inducible operons are not only controlled by their respective inducers and
regulatory genes, but they are also controlled by the level of glucose in the environment.
The ability of glucose to control the expression of a number of different inducible
operons is called CATABOLITE REPRESSION. Catabolite repression is generally seen
in those operons which are involved in the degradation of compounds used as a source of
energy. Since glucose is the preferred energy source in bacteria, the ability of glucose to
regulate the expression of other operons ensures that bacteria will utilize glucose before
any other carbon source as a source of energy.
Mechanism - There is an inverse relationship between glucose levels and cyclic AMP
(cAMP) levels in bacteria. When glucose levels are high cAMP levels are low and when
glucose levels are low cAMP levels are high. This relationship exists because the
transport of glucose into the cell inhibits the enzyme adenyl cyclase which produces
cAMP. In the bacterial cell cAMP binds to a cAMP binding protein called CAP or CRP.
The cAMP-CAP complex, but not free CAP protein, binds to a site in the promoters of
catabolite repression-sensitive operons. The binding of the complex results in a more
efficient promoter and thus more initiations of transcriptions from that promoter as
illustrated in Figures 2.35 and 2.36. Since the role of the CAP-cAMP complex is to turn
on transcription this type of control is said to be POSITIVE CONTROL. The
consequences of this type of control is that to achieve maximal expression of a catabolite
repression sensitive operon glucose must be absent from the environment and the inducer
of the operon must be present. If both are present, the operon will not be maximally
expressed until glucose is metabolized. Obviously, no expression of the operon will occur
unless the inducer is present.
(B). REPRESSIBLE GENES - THE OPERON MODEL
1. Definition
Repressible genes are those in which the presence of a substance (a co-repressor) in the
environment turns off the expression of those genes (structural genes) involved in the
metabolism of that substance. e.g., Tryptophan represses the expression of the trp genes.
Repression is common in metabolic pathways that result in the biosynthesis of a
substance and the co-repressor is normally the end product of the pathway being
regulated.
[39]
2. Tryptophan
operon
a.
Structural
genes - The
tryptophan operon
(Figure
2.37)
contains
five
structural genes
that code for
enzymes involved
in the synthesis of
tryptophan. These
genes
are
transcribed from a
common promoter
into
a
polycistronic
mRNA, which is
translated to yield
the five enzymes.
Figure 2.36 : Effect of glucose on expression of proteins encoded by the
lac operon
b. Regulatory
gene
The
expression of the structural genes is not only influenced by the presence or absence of the
co-repressor, it is also controlled by a specific regulatory gene. The regulatory gene may
be next to or far from the genes that are being regulated. The regulatory gene codes for a
specific protein product called a REPRESSOR (sometimes called an apo-repressor).
When the repressor is synthesized it is inactive. However, it can be activated by
complexing with the co-repressor (i.e. tryptophan).
c. Operator The
active
repressor
/
co-repressor complex acts by
binding to a
specific region
of the DNA
called
the
operator which
is adjacent to the
structural genes
being regulated.
The
structural
Figure 2.37 : The tryptophan operon
[40]
genes together with the
operator region and the
promoter is called an
OPERON. Thus, in the
presence of the corepressor the repressor is
active and binds to the
operator,
resulting
in
repression of transcription
of the structural genes. In
contrast, in the absence of
co-repressor the repressor
is inactive and does not
bind to the operator,
resulting in transcription of
the structural genes. This
kind of control is referred
to
a
NEGATIVE
CONTROL
since
the
function of the regulatory
gene product (repressor) is
to turn off transcription of
the structural genes.
d. Co-repressor
Transcription
of
the Figure 2.38 : The effect of tryptophan on express ion from the tryp operon
tryptophan
genes
is
influenced by the presence or absence of a co-repressor (tryptophan) (Figure 2.38).
e.g. :- + co-repressor = no expression
& - co-repressor = expression
3. Attenuation
In
many
repressible
operons,
transcription that
initiates at the
promoter
can
terminate
prematurely in a
leader region that
precedes the first
structural
gene.
(i.e.
the
polymerase
terminates
Figure 2.39 : Mechanism of attenuation
[41]
transcription before it gets to the first gene in the operon). This phenomenon is called
ATTENUATION; the premature termination of transcription. Although attenuation is
seen in a number of operons, the mechanism is best understood in those repressible
operons involved in amino acid biosynthesis. In these instances attenuation is regulated
by the availability of the cognate aminoacylated t-RNA.
Mechanism (See Figure 2.39) - When transcription is initiated at the promoter, it
actually starts before the first structural gene and a leader transcript is made. This leader
region contains a start and a stop signal for protein synthesis. Since bacteria do not have a
nuclear membrane, transcription and translation can occur simultaneously. Thus, a short
peptide can be made while the RNA polymerase is transcribing the leader region. The test
peptide contains several tryptophan residues in the middle of the peptide. Thus, if there is
a sufficient amount of tryptophanyl-t-RNA to translate that test peptide, the entire peptide
will be made and the ribosome will reach the stop signal. If, on the other hand, there is
not enough tryptophanyl-t-RNA to translate the peptide, the ribosome will be arrested at
the two tryptophan codons before it gets to the stop signal.
The sequence in the leader m-RNA contains four regions, which have complementary
sequences (Figure 2.40). Thus, several different secondary stem and loop structures can
be formed. Region 1 can only form base pairs with region 2; region 2 can form base pairs
with either region 1 or 3; region 3 can form base pairs with region 2 or 4; and region 4
can only form base pairs with region 3. Thus three possible stem/loop structures can be
formed in the RNA.
region 1:region 2
region 2:region 3
region 3:region 4
One of the possible structures (region 3 base pairing with region 4) generates a signal for
RNA polymerase to terminate transcription (i.e. to attenuate transcription). However, the
formation of one stem and loop structure can preclude the formation of others. If region 2
forms base pairs with region 1 it is not available to base pair with region 3. Similarly if
region 3 forms base pairs with region 2 it is
not available to base pair with region 4.
The ability of the ribosomes to translate the
test peptide will affect the formation of the
various stem and loop structures Figure
2.41. If the ribosome reaches the stop signal
for translation it will be covering up region 2
and thus region 2 will not available for
forming base pairs with other regions. This
allows the generation of the transcription
termination signal because region 3 will be
available to pair with region 4. Thus, when
there is enough tryptophanyl-t-RNA to
translate the test peptide attenuation will
occur and the structural genes will not be Figure 2.40 : Formation of stem-loops
transcribed. In contrast, when there is an
insufficient amount of tryptophanyl-t-RNA to translate the test peptide no attenuation
[42]
will occur. This is because the ribosome will stop at the two tryptophan codons in region
1, thereby allowing region 2 to base pair with region 3 and preventing the formation of
the attenuation signal (i.e. region 3 base paired with region 4). Thus, the structural genes
will be transcribed.
Figure 2.41 : Mechanism of atteunation
(2) In Viruses Viruses consist of a nucleic acid (DNA or RNA) enclosed in a protein
coat (known as a capsid). The capsid may be a single protein repeated over and over, as
in tobacco mosaic virus (TMV). It may also be several different proteins, as in the T-even
bacteriophages. Once inside the cell, the nucleic acid follows one of two paths: lytic or
lysogenic.
Retroviruses, such as Human Immuno-difficiency Virus (HIV), also include the enzyme
reverse transcriptase with the viral RNA. Reverse transcriptase makes a single-stranded
viral DNA copy of the single-stranded viral RNA. The single stranded viral DNA is
subsequently turned into a double-stranded DNA.
The lytic cycle occurs when the viral DNA immediately takes over the host cell
(remember that viruses are obligate intracellular parasites) and begins making new
viruses. Eventually the new viruses cause the rupture (or lysis) of the cell, releasing those
new viruses to continue the infection cycle. The lysogenic cycle occurs when the viral
DNA is incorporated into the host DNA as a prophage. When the cell replicates the
prophage is passed along as if it were host DNA. Sometimes the prophage can emerge
from the host chromosome and enter the lytic cycle spontaneously once every 10,000 cell
divisions. Ultraviolet light and x-rays may also trigger emergence of the prophage.
Transduction is the transfer of host DNA from one cell to another by a virus (Figure
2.42). Some bacteriophages are temperate since they tend to go lysogenic rather than
lytic. These types of viruses are able to transduce fragments of the host DNA.
[43]
Transposons are DNA fragments incorporated into the chromosomal DNA (Figure
2.43). Unlike episomes and prophages, transposons contain a gene producing an enzyme
that catalyzes insertion of the transposon at a new site. They also have repeated
sequences 2040 nucleotides
in length at
each
end.
Insertion
sequences are
short (600-1500
base pairs long)
simple
transposons
that do not
carry
genes
beyond those
essential
for
insertion of the
transposon into
E.
coli.
Complex
transposons are
much larger
2.42
:
and
carry Figure
Induction
of
additional
by
genes. Genes transduction
viruses in bacteria.
incorporated
Images from Purves
in a complex et al., Life: The
transposon are Science of Biology,
known
as 4th Edition, by
Associates
jumping genes Sinauer
(http://www.sinauer.
since they can com/) and WH
move about Freeman
on
the (http://www.whfreem
an.com/), used with
chromosome
(even
from permission.
chromosome to chromosome). Often the complex transposons are flanked by simple
transposons.
Gene Regulation in Eukaryotes
In the absence of precise information about the mechanisms that regulate gene expression
in eukaryotes, many models were proposed. One of the more popular early models
known as Britten Davidson model or Gene Battery model was that given by R.J.
Britten and E.H. Davidson in 1969. This model even though widely accepted, is only a
[44]
theoretical model and lacks sound practical proof. The model predicts the presence of
four types of sequences.
Producer gene - It is comparable to a structural gene in prokaryotes. It produces pre
mRNA, which after processing becomes mRNA. Its expression is under the control of
many receptor sites.
Receptor site (gene) - It is comparable to the operator in bacterial operon. At least
one such receptor site is assumed to be present adjacent to each producer gene. A specific
receptor site is activated when a specific activator RNA or an activator protein, a product
of integrator gene, complexes with it.
Integrator gene - Integrator gene is comparable to regulator gene and is responsible for
the synthesis of an activator RNA molecule that may not give rise to proteins before it
activates the receptor site. At least one integrator gene is present adjacent to each sensor
site.
Sensor site - A sensor site regulates activity of an integrator gene which can be
transcribed only when the sensor site is activated. The sensor sites are also regulatory
sequences that are recognized by external stimuli, e.g. hormones, temperature. According
to the Britten Davidson model, specific sensor genes represent sequence-specific binding
sites (similar to CAP-cAMP binding site in the E. coil) that respond to a specific signal.
When sensor genes receive the appropriate signals, they activate the transcription of the
adjacent integrator genes. The integrator gene products will then interact in a sequence
specific manner with receptor genes.
Britten and Davidson proposed that the integrator gene products are activator RNAs that
interact directly with the receptor genes to trigger the transcription of the continuous
producer genes.
It is also proposed that receptor sites and integrator genes may be repeated a number of
times so as to control the activity of a large number of genes in the same cell. Repetition
of receptor ensures that the same activator recognizes all of them and in this way several
enzymes of one metabolic pathway are simultaneously synthesized.
Transcription of the same gene may be needed in different developmental stages. This is
achieved by the multiplicity of receptor sites and integrator genes. Each producer gene
may have several receptor sites, each responding to one activator. Thus, though a single
activator can recognize several genes, different activators may activate the same gene at
different times.
A set of structural genes controlled by one sensor site is termed as a battery. Sometimes
when major changes are needed, it is necessary to activate several sets of genes. If one
sensor site is associated with several integrators, it may cause transcription of all
integrators simultaneously thus causing transcription of several producer genes through
receptor sites.
The repetition of integrator genes and receptor sites is consistent with the reports that
state that sufficient repeated DNA occurs in the eukaryotic cells. The most attractive
features of the Britten and Davidson model is that it provides a plausible reason for the
[45]
observed pattern of interspersion of moderately repetitive DNA sequences and single
copy DNA sequences.
Direct evidence indicates that most structural genes are indeed single copy DNA
sequences. The adjacent moderately repetitive DNA sequences would contain the various
kinds of regulator genes (sensor, integrator and receptor genes).
The latest estimates are that a human cell, a eukaryotic cell, contains 20,000–25,000
genes.
 Some of these are expressed in all cells all the time. These so-called housekeeping
genes are responsible for the routine metabolic functions (e.g. respiration)
common to all cells.
 Some are expressed as a cell enters a particular pathway of differentiation.
 Some are expressed all the time in only those cells that have differentiated in a
particular way. For example, a plasma cell expresses continuously the genes for
the antibody it synthesizes.
 Some are expressed only as conditions around and in the cell change. For
example,
the
arrival
of
a
hormone
may
turn on (or off)
certain genes in
that cell.
Figure 2.43 : Transposons
and
their
relationship to other
genes.
Image
from
Purves et al., Life: The
Science of Biology, 4th
Edition,
by
Sinauer
Associates
(http://www.sinauer.com/
) and WH Freeman
(http://www.whfreeman.c
om/),
used
with
permission.
How is gene expression regulated?
There are several methods used by eukaryotes.
 Altering the rate of transcription of the gene. This is the most important and
widely-used strategy and the one we shall examine here.
 However, eukaryotes supplement transcriptional regulation with several other
methods:
o Altering the rate at which RNA transcripts are processed while still within
the nucleus.
o Altering the stability of mRNA molecules; that is, the rate at which they are
degraded.
o Altering the efficiency at which the ribosomes translate the mRNA into a
polypeptide.
[46]
Protein-coding genes have
 exons whose sequence encodes the polypeptide;
 introns that will be removed from the mRNA before it is translated;
 a transcription start site
 a promoter
o the basal or core promoter located within about 40 bp of the start site
o an "upstream" promoter, which may extend over as many as 200 bp farther
upstream
 enhancers
 silencers
Adjacent genes (RNA-coding as well as protein-coding) are often separated by an
insulator which helps them avoid cross-talk between each other's promoters and
enhancers (and/or silencers).
Transcription start site This is where a molecule of RNA polymerase II (pol II, also
known as RNAP II) binds. Pol II is a complex of 12 different proteins (shown in the
figure in yellow with small colored circles superimposed on it).
The start site is where transcription of the gene into RNA begins.
Figure 2.44 : Eukaryotic promoter with TFIID
The
basal
promoter The
basal promoter
(Figure
2.44)
contains
a
sequence of 7
bases
(TATAAAA) called the
TATA box. It is
bound by a large
complex of some
50 different proteins, including
 Transcription Factor IID (TFIID) which is a complex of
o TATA-binding protein (TBP), which recognizes and binds to the TATA box
o 14 other protein factors which bind to TBP — and each other — but not to
the DNA.
 Transcription Factor IIB (TFIIB) which binds both the DNA and pol II.
Figure 2.45 : Eukaryotic promoter with Enhancer Binding Protein
[47]
The basal or core promoter is found in all protein-coding genes. This is in sharp contrast
to the upstream promoter whose structure and associated binding factors differ from gene
to gene.
Although the figure is drawn as a straight line, the binding of transcription factors to each
other probably draws the DNA of the promoter into a loop.
Many different genes and many different types of cells share the same transcription
factors - not only those that bind at the basal promoter but even some of those that bind
upstream (Figure 2.45). What turns on a particular gene in a particular cell is probably the
unique combination of promoter sites and the transcription factors that are chosen.
An Analogy The rows of lock boxes in a bank provide a useful analogy.
To open any particular box in the room requires two keys:
 your key, whose pattern of notches fits only the lock of the box assigned to you
(= the upstream promoter), but which cannot unlock the box without
 a key carried by a bank employee that can activate the unlocking mechanism of
any box (= the basal promoter) but cannot by itself open any box.
Note : Transcription factors represent only a small fraction of the proteins in a cell.
Hormones exert many of their effects by
forming transcription factors - The
complexes of hormones with their receptor
represent one class of transcription factor.
Hormone "response elements", to which the
complex binds, are promoter sites.
Embryonic
development
requires
the
coordinated production and distribution of
transcription factors.
Enhancers
Some transcription factors
("Enhancer-binding protein") bind to regions of
DNA that are thousands of base pairs away from
the gene they control (Figure 2.46). Binding
increases the rate of transcription of the gene.
Enhancers
can
be
located
upstream,
downstream, or even within the gene they
control.
How does the binding of a protein to an
enhancer regulate the transcription of a gene
thousands of base pairs away? One possibility is
that enhancer-binding proteins — in addition to their
DNA-binding site, have sites that bind to transcription
factors ("TF") assembled at the promoter of the gene.
This would draw the DNA into a loop (as shown in
the figure 2.46).
Figure 2.46 : Some of the
transcription factors that
produce the segmented
body plan in Drosophila.
E2 and Sp1 type of
Binding Proteins.
Visual evidence Michael R. Botchan (who kindly
[48]
supplied these electron micrographs) and his colleagues have produced visual evidence of
this model of enhancer action. They created an artificial DNA molecule with
 several promoter sites for Sp1 about 300 bases from one end. Sp1 is a zinc-finger
transcription factor that binds to the sequence 5' GGGCGG 3' found in the
promoters of many genes, especially "housekeeping" genes.
 several enhancer sites about 800 bases from the other end. These are bound by an
enhancer-binding protein designated E2.
 1860 base pairs of DNA between the two.
When these DNA molecules were added to a mixture of Sp1 and E2, the electron
microscope showed that the DNA was drawn into loops with "tails" of approximately 300
and 800 base pairs.
At the neck of each loop were two distinguishable globs of material, one representing
Sp1 (red), the other E2 (blue) molecules. (The two micrographs are identical; the lower
one has been labeled to show the interpretation.)
Artificial DNA molecules lacking either the promoter sites or the enhancer sites, or with
mutated versions of them, failed to form loops when mixed with the two proteins.
Silencers Silencers are control regions of DNA that, like enhancers, may be located
thousands of base pairs away from
the gene they control. However,
when transcription factors bind to
them, expression of the gene they
control is repressed.
Insulators A problem: As you can
see above, enhancers can turn on
promoters
of
genes
located
thousands of base pairs away. What
is to prevent an enhancer from
inappropriately binding to and
activating the promoter of some
other gene in the same region of the
chromosome?
One answer: an insulator.
Insulators are
Figure 2.47 : Chromosome 14 showing δ and
 stretches of DNA (as few as 42
α gene segments with promoter and
base pairs may do the trick)
enhancer.
 located between the
o enhancer(s)
and
promoter or
o silencer(s) and promoter
of adjacent genes or clusters of adjacent genes.
Their function is to prevent a gene from being influenced by the activation (or repression)
of its neighbors.
Example: The enhancer for the promoter of the gene for the delta chain of the
gamma/delta T-cell receptor for antigen (TCR) is located close to the promoter for the
alpha chain of the alpha/beta TCR (on chromosome 14 in humans) (Figure 2.47). A T cell
[49]
must choose between one or the other. There is an insulator between the alpha gene
promoter and the delta gene promoter that ensures that activation of one does not spread
over to the other.
All insulators discovered so far in vertebrates work only when bound by a protein
designated CTCF ("CCCTC binding factor"; named for a nucleotide sequence found in
all insulators). CTCF has 11 zinc fingers.
Another example: In mammals (mice, humans, pigs), only the allele for insulin-like
growth factor-2 (IGF2) inherited from one's father is active; that inherited from the
mother is not — a phenomenon called imprinting.
The mechanism: the mother's allele has an insulator between the IGF2 promoter and
enhancer. So does the father's allele, but in his case, the insulator has been methylated.
CTCF can no longer bind to the insulator, and so the enhancer is now free to turn on the
father's IGF2 promoter.
Many of the commercially-important varieties of pigs have been bred to contain a gene
that increases the ratio of skeletal muscle to fat. This gene has been sequenced and turns
out to be an allele of IGF2, which contains a single point mutation in one of its introns.
Pigs with this mutation produce higher levels of IGF2 mRNA in their skeletal muscles
(but not in their liver).
This tells us that:
 Mutations need not be in the protein-coding portion of a gene in order to affect
the phenotype.
 Mutations in non-coding portions of a gene can affect how that gene is regulated
(here, a change in muscle but not in liver).
2.4 LET US SUM UP
To map phage genes, bacterial cells are infected with viruses that differ in two or
more genes. Recombinant plaques are counted, and rates of recombination are
used to determine the linear order of the genes on the chromosome and the
distance between them
Self-splicing introns are of two types: group I introns and group II introns. These
introns have complex secondary structures that enable them to catalyze their
excision from RNA molecules without the aid of enzymes or other proteins.
Intron splicing of nuclear genes is a two-step process: (1) the 5' end of the intron
is cleaved and attached to the branch point to form a lariat and (2) the 3' end of
the intron is cleaved and the two ends of the exon are spliced together. These
reactions take place within the spliceosome.
Alternative splicing enables exons to be spliced together in different combinations
to yield mRNAs that encode different proteins. Alternative 3' cleavage sites allow
pre-mRNA to be cleaved at different sites to produce mRNAs of different lengths.
[50]
5.5 CHECK YOUR PROGRESS
NOTE: (1) Write your answer in the space given below.
(2) Compare your answer with the ones given at the end of the unit.
‫( א‬1) Fill in the blanks :
(a) ….. and its close relative ……. are viruses that infect the bacterium E. coli.
(b) The strain B of E. coli can be infected by both ….. and …… strains of T2
bacteriophage.
(c) When the lambda virus enters a cell, a virus-encoded enzyme called
………………… is synthesized.
(d) Plasmids are best thought of as ………………………………...
(e) A zygote inherits nuclear genes from both parents, but typically all of its
cytoplasmic organelles, and thus all its cytoplasmic genes, come from
…………………. of the gametes, usually the …………….
‫( א‬2) Write the answer of following questions :
(a) What are Techniques for the study of Bacteriophages genome?
(b) Explain the Genetic Transformation, Conjugation and Transduction in
Bacteria?
5.6 CHECK YOUR PROGRESS: THE KEY
‫( א‬1) (a) T2, T4
(d) mini-chromosomes
(b) h+, h
(c) lambda integrase
(e) only one of the gametes, egg
‫( א‬2) (a) see section 2.2.2 (b) see section 2.2.4
[51]
2.7 ASSIGNMENT
Make a project explaining Gene Regulation in Filamentous Fungi.
2.8 REFERENCES
Our courteous thanks to following two authors/publishers for preparing the
various section of this chapter:B. Alberts et al., ‘Molecular Biology of the Cell’: 4th Ed. (2002). Garland.
Benzamin A. Pierce, ‘Genetics : A Coneptual Approach’
Other helping resources are as follows:Lewin. Genes VII. (2000). Oxford University Press.
C. R. Calladine and H. R. Drew. Understanding DNA: The Molecule and How It
Works. 2nd edn (1997). Academic Press. (3rd Ed. due in 2004).
Jeremy Dale and Simon F. Park. Molecular Genetics of Bacteria, 4th Edition
2004. John Wiley & Sons, Ltd
H. Lodish et al. Molecular Cell Biology, 4th Ed. (1995). W. H. Freeman. (5th Ed
due in 2003–2004).
L. Snyder and W. Champness (2003). Molecular Genetics of Bacteria, 2nd Ed.
American Society for Microbiology.
S. Baumberg (Ed.) (1999). Prokaryotic Gene Expression.
M. T. Madigan, J. M. Martinko and J. Parker (2000). Biology of Microorganisms
(better known as ‘Brock’), 9th Ed. Prentice Hall International.
J. W. Dale and M. von Schantz (2002). From Genes to Genomes. John Wiley &
Sons.
T. A. Brown (2001). Gene Cloning – An Introduction, 4th Ed. Blackwell Science.
S. B. Primrose, R. Twyman and R. W. Old (2001). Principles of Gene
Manipulation, 6th Ed. Blackwell Science.
B. R. Glick (2003). Molecular Biotechnology: Principles and Applications of
Recombinant DNA, 3rd Ed. American Society for Microbiology.
D. P. Snustad and M. J. Simmons (2000). Principles of Genetics, 2nd Ed. John
Wiley.
W. S. Klug and M. R. Cummings (2000). Concepts of Genetics, 6th Ed. Prentice
Hall.
L. H. Hartwell and others (2000). Genetics. McGraw-Hill.
P. J. Russell (2002). Genetics. Benjamin Cummings.
A. J. F. Griffiths, W. M. Gelbart, R. C. Lewontin and J. H. Miller (2002). Modern
Genetic Analysis, 2nd Ed. W. H. Freeman.
R. W. Hendrix et al (1983). Lambda II.
M. Wilson, R. McNab and B. Henderson (2002). Bacterial Disease Mechanisms.
Cambridge University Press.
[52]
W. Hayes (1968). The Genetics of Bacteria and their Viruses, 2nd Ed. Blackwell
Scientific Publications.
Websites which give more information on bacterial genome sequences and
access to genomic data bases are as follows:http://www.sanger.ac.uk/
http://www.tigr.org/
http://www.ncbi.nlm.nih.gov/
******
[53]
UNIT-3 GENETIC
MAPPING
RECOMBINATION
AND
Structure
3.0 Introduction
3.1 Objectives
3.2 Recombination:
3.2.1 Independent Assortment and Crossing Over
3.2.2 Molecular Mechanism of Recombination
3.2.3 Role of RecA and RecBCD enzymes
3.2.4 Site Specific Recombination
3.2.5 Chromosome Mapping Linkage Groups and Genetic Markers
3.2.6 Construction of Molecular Maps
3.2.7 Correlation of Genetic and Physical Maps
3.2.8 Somatic Cell Genetics – An Alternative Approach to Gene Mapping
3.3 Mutations:
3.2.1 Spontaneous and Induced Mutations
3.2.2 Physical and Chemical Mutagens
3.2.3 Molecular Basis of Gene Mutations
3.2.4 Transposable Elements in Mutagenesis
3.2.5 DNA Damage and Repair Mechanism
3.2.6 Inherited Human Diseases and Defects in DNA repair
3.2.7 Initiation of Cancer at Cellular Level
3.2.8 Proto-oncogenes and Oncogenes
3.4 Let Us Sum Up
3.5 Check Your Progress
3.6 Check Your Progress: The Key
3.7 Assignment
3.8 References
3.1 INTRODUCTION
In the two preceding sections we discussed the mechanisms by which DNA sequences in
cells are maintained from generation to generation with very little change. Although such
genetic stability is crucial for the survival of individuals, in the longer term the survival
of organisms may depend on genetic variation, through which they can adapt to a
changing environment. Thus an important property of the DNA in cells is its ability to
undergo rearrangements that can vary the particular combination of genes present in any
individual genome, as well as the timing and the level of expression of these genes. These
DNA rearrangements are caused by genetic recombination. Two broad classes of genetic
recombination are commonly recognized : –
[54]
(a) General recombination and (b) Site-specific recombination.
In general recombination, genetic exchange takes place between any pair of
homologous DNA sequences, usually located on two copies of the same chromosome.
One of the most important examples is the exchange of sections of homologous
chromosomes (homologues) in the course of meiosis. This "crossing-over" occurs
between tightly apposed chromosomes early in the development of eggs and sperm and it
allows different versions (alleles) of the same gene to be tested in new combinations with
other genes, increasing the chance that at least some members of a mating population will
survive in a changing environment. Although meiosis occurs only in eukaryotes, the
advantage of this type of gene mixing is so great that mating and the reassortment of
genes by general recombination are also widespread in bacteria.
This process leads to offspring having different combinations of genes from their parents
and can produce new chimeric alleles. Enzymes called recombinases catalyze natural
recombination reactions. RecA, the recombinase found in E. coli, is responsible for the
repair of DNA double strand breaks (DSBs). In yeast and other eukaryotic organisms
there are two recombinases required for repairing DSBs. The RAD51 protein is required
for mitotic and meiotic recombination and the DMC1 protein is specific to meiotic
recombination.
Chromosomal crossover refers to recombination between the paired chromosomes
inherited from each of one's parents, generally occurring during meiosis. During
prophase-I the four available chromatids are in tight formation with one another. While in
this formation, homologous sites on two chromatids can mesh with one another, and may
exchange genetic information. Because recombination can occur with small probability at
any location along chromosome, the frequency of recombination between two locations
depends on their distance. Therefore, for genes sufficiently distant on the same
chromosome the amount of crossover is high enough to destroy the correlation between
alleles. In gene conversion, a section of genetic material is copied from one chromosome
to another, but leaves the donating chromosome unchanged.
Recombination can occur between DNA sequences that contain no sequence homology.
This is referred to as Nonhomologous recombination or Nonhomologous end joining.
DNA homology is not required in site-specific recombination. Instead, exchange occurs
at short, specific nucleotide sequences (on either one or both of the two participating
DNA molecules) that are recognized by a variety of site-specific recombination enzymes.
Site-specific recombination therefore alters the relative positions of nucleotide sequences
in genomes. In some cases these changes are scheduled and organized, as when an
integrated bacterial virus is induced to leave a chromosome of a bacterium under stress;
in others they are haphazard, as when the DNA sequence of a transposable element is
inserted at a randomly selected site in a chromosome.
As for DNA replication, most of what we know about the biochemistry of genetic
recombination has come from studies of prokaryotic organisms, especially of E. coli and
its viruses.
3.1 OBJECTIVE
[55]
This unit set sights on recombination and mutation, both the process are responsible for
the evolution of the organism. Following points will be covered for understanding of
students:
Medal’s law of independent
assortment is based on the
mechanism of recombination
where other than wild type
new type of progeny
produce in F2 generation.
General
Recombination
process is guided by basepairing interactions between
complementary strands of
two
homologous
DNA
molecules
where
nick
formation occur in one
strand
RecA and RecBCD are
important
enzyme
of
recombination which have
been studied in E. coli.
Site-specific recombination
enzymes move special DNA
sequences into and out of
genomes.
Different type of inducers
like physical as well as
chemical mutagens causes
mutation.
Variety of DNA repair
mechanism
found
in
organism to avoid the
mutation.
Figure 3.1: The genotypes of two independent traits
Study of some severe human
show a 9:3:3:1 ratio in the F2 generation. In this
example, coat color is indicated by B (brown,
disease developed due to
dominant) or b (white) while tail length is indicated by
disorders of genes.
S (short, dominant) or s (long). When parents are
Cellular imbalance lead to
homozygous for each trait ('SSbb and ssBB), their
the development of cancer
children in the F1 generation are heterozygous at both
where proto-oncogenes and
loci and only show the dominant phenotypes. If the
children mate with each other, in the F2 generation all
oncogenes may also involve.
3.2 RECOMBINATION
combination of coat color and tail length occur: 9 are
brown/short (purple boxes), 3 are white/short (pink
boxes), 3 are brown/long (blue boxes) and 1 is
white/long (green box).
3.2.1 Independent Assortment and Crossing Over
[56]
Independent Assortment - The Law of Independent Assortment, also known as
"Inheritance Law", states that the inheritance pattern of one trait will not affect the
inheritance pattern of another. While Mendel's experiments with mixing one trait always
resulted in a 3:1 ratio between dominant and recessive phenotypes, his experiments with
mixing two traits (dihybrid cross) showed 9:3:3:1 ratios (Figure 3.1). But the 9:3:3:1
table shows that each of the two genes is independently inherited with a 3:1 ratio. Mendel
concluded that different traits are inherited independently of each other, so that there is
no relation, for example, between a cat's color and tail length. This is actually only true
for genes that are not linked to each other. Independent assortment occurs during meiosis
I in eukaryotic organisms, specifically anaphase I of meiosis, to produce a gamete with a
mixture of the organism's maternal and paternal chromosomes. Along with chromosomal
crossover, this process aids in increasing genetic diversity by producing novel genetic
combinations. Of the 46 chromosomes in a normal diploid human cell, half are
maternally-derived (from the mother's egg) and half are paternally-derived (from the
father's sperm).
This occurs as sexual reproduction involves the fusion of two haploid gametes (the egg
and sperm) to produce a new organism having the full complement of chromosomes.
During gametogenesis - the production of new gametes by an adult - the normal
complement of 46 chromosomes needs to be halved to 23 to ensure that the resulting
haploid gamete can join with another gamete to produce a diploid organism. An error in
the number of chromosomes, such as those caused by a diploid gamete joining with a
haploid gamete, is termed aneuploidy.
In independent assortment the chromosomes that end up in a newly-formed gamete are
randomly sorted from all possible combinations of maternal and paternal chromosomes.
Because gametes end up with a random mix instead of a pre-defined "set" from either
parent, gametes are therefore considered assorted independently. As such, the gamete can
end up with any combination of paternal or maternal chromosomes. Any of the possible
combinations of gametes formed from maternal and paternal chromosomes will occur
with equal frequency. For human gametes, with 23 pairs of chromosomes, the number of
possibilities is 223 or 8,388,608 possible combinations. The gametes will normally end up
with 23 chromosomes, but the origin of any particular one will be randomly selected
from paternal or maternal chromosomes. This contributes to the genetic variability of
progeny.
Crossing Over - Crossing over occurs between equivalent portions of two nonsister
chromatids (Figure 3.2). Each chromatid contains a single molecule of DNA. So the
problem of crossing over is really a problem of swapping portions of adjacent DNA
molecules. It must be done with great precision so that neither chromatid gains or loses
any genes. In fact, crossing over has to be sufficiently precise that not a single nucleotide
is lost or added at the crossover point if it occurs within a gene. Otherwise a frameshift
would result and the resulting gene would produce a defective product or, more likely, no
product at all.
How do nonsister chromatids ensure that crossing over between them will occur without
the loss or gain of a single nucleotide? One plausible mechanism for which there is
considerable laboratory evidence postulates the following events.
[57]
Note that each recombinant DNA
molecule includes a region where
nucleotides from one of the original
molecules
are
paired
with
nucleotides from the other. But no
matter, the need for a smooth double
helix guarantees that each exchange
takes places without any gain or loss
of nucleotides. So long as the total
number of nucleotides in each strand
and the complementarities (A-T, CG) are preserved, this "heteroduplex"
region (which may extend for
hundreds of base pairs) will only
rarely have genetic consequences.
And these may, in fact, be helpful
because the synthesis of a short
stretch of DNA using the template
provided by the other chromatid also
provides a mechanism for repairing
any damage that might have been
present on the "invading" strand of
DNA. If the cut in the molecule 1
occurs in the region of a mutation, the
damaged or incorrect nucleotides can
be digested away. Refilling the
resulting gap, using the undamaged
molecule 2 as the template, repairs
the damage to molecule 1. Why
should the cutting and ligation be
limited to the strands shown? They
are not. Half the time the cutting and Figure 3.2: Mechanism of Crossing Over
ligating rejoins the original parental
arms. In these cases, no crossover takes place. The only genetic change that might have
occurred is a transfer of some genetic information in the heteroduplex region. So crossing
over not only provides a mechanism for genetic recombination during meiosis but also
provides a means of repairing damage to the genome.
3.2.2 Molecular mechanisms of Recombination
DNA replication with 100% fidelity is a nice feature to keep offspring in just the genetic
background of the species. But to get there, or to evolve further, requires genetical
changes, one of which results from recombination of (near) homologous parts of DNA.
The nature of structural changes in DNA neccessary to result in homologous genetic
recombination were layed out by R. HOLLIDAY in 1964, and in subsequent years the
crossover-structures were visualized by electron microscopy (Figure 3.3). The actual
[58]
conformation of a DNA crossover was speculated to be a four-way-junction with separate
DNA helices, or with stacked helices in either a parallel or an antiparallel orientation of
the helices. The models had to allow for branch migration, else no exchange of genetic
material would happen.
Figure 3.3: A ‘X’-form that has been prepared for the electron microscope in the presence of a
high concentration of formamide. Under these conditions the DNA double helix is stressed, and
those regions particulary rich in AT base pairs undergo a localized denaturation. This sequencespecific denaturation allows the homologous arms in the molecule to be identified. Furthermore,
the covalent strand connections in the region of the crossover can be seen. In this and other 80
open molecules, the homologous arms are in a trans configuration (Photo: H. POTTER und D.
DRESSLER, 1976).
Breakage - Fusion (Reunion) - Bridge Cycle, Control Elements And
Unstable Genes - Since the beginning of the century has it been known that unstable
or variable gene loci occur in plants although the drastically enhanced mutability and the
increased back mutation rate could not be explained at first. The decisive breakthrough
was accomplished by B. McCLINTOCK with her studies on maize chromosomes
published in 1947 and 1951. The basis of these were her earlier observations and analyses
(1938) on breakage-fusion (reunion)-bridges. Their occurrence could be correlated with
the restructuring in chromosomes.
Bridges are formed during anaphase whenever two chromosomes fuse at their ends
generating a fusion product with two centromers. If these two are subsequently torn to
different poles then will inevitably occur a chromosomal fraction. During the following
S-phase of the interphase nucleus is a chromatid with a fraction at its terminus replicated
in just the same way as the other chromosomes leading again to a fusion of the
homologous chromatids. Consequently can a chromosome consisting of just one
chromatid but two centromers be found in the subsequent mitosis instead of a
chromosome out of two chromatids and one centromer. The consequence is a second
fraction during anaphase where the second round of the cycle starts.
B. McCLINTOCK recognized that the fraction cannot occur at any site of the
chromosome but is restricted to certain sections that she called Ds (dissociation). These
were obviously DNA segments contributing to the formation of translocations, deletions,
inversions and to the generation of ring-shaped chromosomes. The first fraction causes
similar fractions in the mitosis cycles of following generations. They happen during
ontogenesis at different times and sites.
[59]
The segment Ds, a mutator gene, behaves like a multiple allele (or, even better, like a
pseudoallele) that can be located at different gene loci. It may also vary in structure. This
mutator can insert itself into other genes thus rendering them inactive. It is a control
element that changes its place within the chromosome, jumping or wandering around and
causing mutations wherever it inserts (the mutators are also called jumping genes).
It soon became clear that a further set of elements has to exist: the Ac (activation)
elements. A chromosomal fraction or a translocation of a Ds element has to be supported
by an Ac element. An Ac element can also be regarded as a multiple allele. It may occur
at the most different sites in all chromosomes. To analyze its effect further concentrated
B. McCLINTOCK on the study of genes that determine the colour of maize grains.
One of the most important is the C-locus that causes a dark red staining of the aleuron
layer and the pericarp of the maize grain in a dominant condition. If a Ds-element jumps
into the gene, colour synthesis is interrupted and colourless (yellow) grains result. An Ac
activity within these grains causes a pattern of dark red areas on a light ground. This is
explained by a reestablishment of the old state since the Ac element removes the Ds from
the C-locus. This happens in several cells during the development of the maize grain.
These cells are the origin of the aleuron layer and the pericarp and the back mutation can
only be perceived in the clones that form out of the changed cells.
Today are a number of gene loci known that can be influenced by the Ds-Ac-system or
other control elements. The detection of the spm-system (suppressor-mutator) and the
elucidation of its function showed that the control elements do not only act as switches (a
yes/ no decision) but that they do modulate the degree of gene expression, too.
The genetic analyses of B. McCLINTOCK were not understood for years. Only when
insertion elements and transposons were found in bacterial DNA during the late sixties
did an analogy between them and the control elements show up. These genetic data fitted
neatly with molecular biological models (B. NEVERS and H. SAEDLER, 1977, H-P.
DÖRING and P. STARLINGER, 1984). Mrs BARBARA McCLINTOCK was awarded
the Nobel prize for medicine and physiology for her pioneer achievements. P.NEVERS,
N. S. SHEPHERD and H. SAEDLER listed the 'unstable plant genes' described in
literature at the beginning of 1986. It shows that such genes have been found in more
than 30 species. Many of the respective mutants with names like variegate, marmorata,
maculata or variabilis are on the market as ornamental plants due to their irregularly
spotted flowers or leaves.
Holliday junction: central intermediate of genetic recombination - DNA
replication with 100% fidelity is a nice feature to keep offspring in just the genetic
background of the species. But to get there, or to evolve further, requires genetic changes,
one of which results from recombination of (near) homologous parts of DNA. The nature
of structural changes in DNA neccessary to result in homologous genetic recombination
were layed out by Holliday in 1964, and in subsequent years the crossover-structures
were visualized by electron microscopy. The actual conformation of a DNA crossover
was speculated to be a four-way-junction with separate DNA helices, or with stacked
helices in either a parallel or an antiparallel orientation of the helices. The models had to
allow for branch migration, else no exchange of genetic material would happen.
During branch migration hydrogen bonds between paired bases have to be broken and
others reformed instead. On average the energy for braking and reforming these bonds
[60]
will cancel each other - but in real existing DNA not all base pairs are created equal. This
calls for the action of enzymes to overcome the neccessary activation energy. And
enzymes are needed anyway to resolve the four-way-junctions into separate helices. In E.
coli. e.g. there exists an enzyme system (RuvABC) the components of which hold the
Holliday junction (RuvA), swivel the DNA strands to enable branch migration (RuvB)
and finally cut the junction (RuvC). A DNA ligase restores intact double helices.
Homologous genetic recombination is a highly dynamic process, in contrast to X-ray
crystallography relaying on static structures. So it took to the end of the previous
millenium to get an atomic detail view of relevant structures. You may see here the
structure of a four-way Holliday-junction formed by homologous DNA strands, a RuvAtetramer complexed to a static Holliday-junction, the motor driving branch migration, and
a Holliday-junction resolving enzyme.
The Ruv-System of E. coli is in itself a
dynamic complex. During branch
migration two tetramers of RuvA
hover on both sides of a cruciform
DNA, with multimeric RuvB clamping
two of the DNA strands to wind them.
This complex is not accessible for
RuvC. In order for the resolvase to act,
one of the RuvA tetramers has to be
dissociated so that one side of the
DNA junction is amenable to strand
separation. In vitro the tetrameroctamer-equilibrium is subject to the
salt concentration of the buffer.
Conditions
neccessary
for
crystallisation of the complex resulted
in tetrameric RuvA complexed to the Figure 3.4: RecA protein-dsDNA complex
DNA.
imaged by atomic force microscopy (AFM):
Our research is focused on the
molecular mechanisms of genetic recombination, with the long-term objective being the
reconstitution of in vitro systems that accurately reproduce the cellular processes. We are
characterizing the biochemical properties of proteins essential to homologous
recombination, in prokaryotes, eukaryotes, and Archaea.
In E. coli, the RecA, RecBCD, RecQ, RuvABC, and SSB proteins, and a specific DNA
sequence called Chi, are essential to homologous recombination. The RecA protein
possesses the unique ability to pair homologous DNA molecules (Figure 3.4) and to
promote the subsequent exchange of DNA strands. Since RecA protein is the prototypic
DNA strand exchange protein, we are interested in the biochemical mechanism of
protein-mediated recognition and exchange of homologous DNA strands. The RecBCD
enzyme is both a DNA helicase and a nuclease with the remarkable properties that its
nuclease activity, but not its helicase activity, is attenuated by interaction with the Chi
sequence, and that it will actively load RecA protein onto ssDNA. RecQ protein is a
helicase that can also effect recombination events. SSB protein is an ssDNA binding
protein that stimulates the activities of RecA, RecBCD, and RecQ proteins by virtue of its
[61]
ability to bind ssDNA. Recently, we reconstituted an in vitro pairing reaction that
requires the concerted action of each of these proteins; the role of each protein in this
reaction is under investigation.
We are also studying the biochemistry of homologous recombination in the yeast, S.
cerevisiae and the archaeon, S. solfataricus. Rad51 and RadA proteins are the RecA
protein homologues, respectively. In yeast, at least three ancillary proteins are needed for
Rad51 protein-mediated DNA strand exchange: these include the RP-A, Rad52, and
Rad54 proteins. We are studying the mechanism of these reconstituted reactions
General Recombination Is Guided by Base-pairing Interactions Between
Complementary Strands of Two Homologous DNA Molecules - General
recombination involves DNA strand-exchange intermediates that require some effort to
understand. Although the exact pathway followed is likely to be different in different
organisms, detailed genetic analyses of viruses, bacteria, and fungi suggest that the major
outcome of general recombination is always the same.
Figure 3.6: A heteroduplex joint. This structure
unites two DNA molecules where they have
crossed over. Such a joint is often thousands of
nucleotides long
Figure 3.5: General recombination. The
breaking
and
rejoining
of
two
homologous DNA double helices creates
two DNA molecules that have "crossed
over.”
[62]
(1) Two homologous DNA molecules "cross over"; that is, their double helices break and
the two broken ends join to their opposite
partners to re-form two intact double
helices, each composed of parts of the two
initial DNA molecules (Figure 3.5).
(2) The site of exchange (that is, where a
red double helix is joined to a green
double helix (in Figure 3.5) can occur
anywhere in the homologous nucleotide
sequences of the two participating DNA
molecules.
(3) At the site of exchange, a strand of one
DNA molecule becomes base-paired to a
strand of the second DNA molecule to
create a staggered joint (usually called a
heteroduplex joint) between the two
double helices (Figure 3.6). The heteroduplex region can be thousands of base
pairs long; we shall explain later how it
forms.
(4) No nucleotide sequences are altered at
the site of exchange; the cleavage and rejoining events occur so precisely that not a
single nucleotide is lost or gained. Despite
this precision, general recombination
creates DNA molecules of novel sequence: the heteroduplex joint can contain a
small number of mismatched base pairs,
and, more important, the two DNAs that
cross over are usually not exactly the same
Figure 3.7: One way to start a recombination
on either side of the joint.
event. The RecBCD protein is an enzyme
The mechanism of general recombination required for general genetic recombination in E.
ensures that two regions of DNA double coli. The protein enters the DNA from one end of
helix undergo an exchange reaction only if the double helix and then uses energy derived
they have extensive sequence homology. from the hydrolysis of bound ATP molecules to
propel itself in one direction along the DNA at a
The formation of a heteroduplex joint rate of about 300 nucleotides per second. A
requires that such homology be present special recognition site (a DNA sequence of eight
because it involves a long region of nucleotides scattered throughout the E. coli
complementary base-pairing between a chromosome) is cut in the traveling loop of DNA
strand from one of the two original double created by the RecBCD protein, and thereafter a
single-stranded whisker is displaced from the
helices and a complementary strand from helix, as shown. This whisker is thought to initiate
the other. But how does this heteroduplex genetic recombination by pairing with a
joint arise, and how do the two homologous helix, as in Figure 3.8.
homologous regions of DNA at the site of
crossing-over recognize each other? As we
shall see, recognition takes place by means of a direct base-pairing interaction. The
[63]
formation of base pairs between complementary strands from the two DNA molecules
then guides the general recombination process, allowing it to occur only between long
regions of matching DNA sequence.
General Recombination Can Be Initiated at a Nick in One Strand of a
DNA Double Helix - Each of the two strands in a DNA molecule is helically wound
around the other. As a result, extensive base-pair interactions can occur between two
homologous DNA double helices only if a nick is first made in a strand of one of them,
freeing that strand for the unwinding and rewinding events required to form a
heteroduplex with another DNA molecule. For the same reason, any exchange of strands
between two DNA double helices requires at least two nicks, one in a strand of each
interacting double helix. Finally, to produce the heteroduplex joint illustrated in Figure
3.6, each of the four strands present must be cut to allow each to be joined to a different
partner.
In general recombination, these nicking and resealing events are coordinated so that they
occur only when two DNA helices share an extensive region of matching DNA sequence.
There is evidence from a number of sources that a single nick in only one strand of a
DNA molecule is sufficient to initiate general recombination. Chemical agents or types of
irradiation that introduce single strand nicks, for example, will trigger a genetic
Figure 3.8: The initial strand exchange in general recombination. A nick in a single DNA
strand frees the strand, which then invades a homologous DNA double helix to form a short
pairing region with one of the strands in the second helix. Only two DNA molecules that are
complementary in nucleotide sequence can base-pair in this way and thereby initiate a
general recombination event. All of the steps shown here can be catalyzed by known
enzymes (see Figures 3.7 and 3.11).
recombination event. Moreover, one of the special proteins required for general
recombination in E. coli the RecBCD protein has been shown to make single strand nicks
in DNA molecules. The RecBCD protein is also a DNA helicase, hydrolyzing ATP and
traveling along a DNA helix transiently exposing its strands. By combining its nuclease
and helicase activities, the RecBCD protein will create a single-stranded "whisker" on the
DNA double helix (Figure 3.7). Figure 3.8 shows how such a whisker could initiate a
base-pairing interaction between two complementary stretches of DNA double helix.
[64]
Figure 3.9: DNA hybridization. DNA double helices re-form from their separated strands in a
reaction that depends on the random collision of two complementary strands. Most such
collisions are not productive, as shown at the left, but a few result in a short region where
complementary base pairs have formed (helix nucleation). A rapid zippering then leads to the
formation of a complete double helix. A DNA strand can use this trial-and-error process to find
its complementary partner in the midst of millions of non-matching DNA strands. Trial-anderror recognition of a complementary partner DNA sequence appears to initiate all general
recombination events.
DNA Hybridization Reactions Provide a Simple Model for the Basepairing Step in General Recombination - In its simplest form, the type of basepairing interaction central to general recombination can be mimicked in a test tube by
allowing a DNA double helix to re-form from its separated single strands. This process,
called DNA renaturation or hybridization, occurs when a rare random collision
juxtaposes complementary nucleotide sequences on two matching DNA single strands,
allowing the formation of a short stretch of double helix between them. This relatively
slow helix nucleation step is followed by a very rapid "zippering" step as the region of
double helix is extended to maximize the number of base-pairing interactions (Figure
3.9). Formation of a new double helix in this way requires that the annealing strands be in
an open, unfolded conformation. For this reason in vitro hybridization reactions are
carried out at high temperature or in the presence of an organic solvent such as
formamide; these conditions "melt out" the short hairpin helices formed where basepairing interactions occur within a single strand that folds back on itself. Bacterial cells
could not survive such harsh conditions and instead use a single-strand binding protein,
the SSB protein, to open their helices. This protein is essential for DNA replication as
well as for general recombination in E. coli; it binds tightly and cooperatively to the
sugar-phosphate backbone of all single-stranded regions of DNA, holding them in an
extended conformation with their bases exposed. In this extended conformation a DNA
single strand can base-pair efficiently with either a nucleoside triphosphate molecule (in
DNA replication) or a complementary section of another DNA single strand (in genetic
recombination). When hybridization reactions are carried out in vitro under conditions
[65]
that mimic those inside a cell, the SSB protein speeds up the rate of DNA helix
nucleation and thereby the overall rate of strand annealing by a factor of more than 1000.
3.2.3 RecA & RecBCD
RecA is a 38 kilodalton Escherichia coli
protein essential for the repair and
maintenance of DNA. RecA has a
structural and functional homolog in every
species in which it has been seriously
sought and serves as an archetype for this
class of homologous DNA repair proteins.
The homologous protein in Homo sapiens
is called RAD51.
RecA has multiple activities, all related to
DNA repair. In the bacterial SOS
response, it has a co-protease function in
the autocatalytic cleavage of the LexA
Figure 3.10: The structure of the RecA protein. A
string of three RecA monomers is shown, with the
repressor and the λ repressor. Its most
position of each ATP in red. The white spheres show
studied role is in facilitating DNA
the putative position of the single-strand DNA in the
recombination for the repair of double
filament, with three nucleotides (each shown as a
strand DNA breaks and the exchange of
sphere) bound per monomer. (From R.M. Story, I.T.
genetic information through sexual
Weber, and T.A. Steitz, Nature 256:318-325, 1992. ©
1992 Macmillan Magazines Ltd.)
reproduction.
E. coli RecA protein also has a major role
in homologous recombination. RecA protein binds strongly and in long clusters to
ssDNA to form a nucleoprotein filament. It has more than one DNA binding site thus
RecA can hold a single strand and double strand together. This feature makes it possible
to catalyze a DNA synapsis reaction between a DNA double helix and a homologous
region of single stranded DNA. The reaction initiates the exchange of strands between
two recombining DNA double helices. After the synapis event, in the heteroduplex region
a process called branch migration begins. In branch migration an unpaired region of one
of the single strands displaces a paired region of the other single strand, moving the
branch point without changing the total number of base pairs. Spontaneous branch
migration can occur, however as it generally proceeds equally in both directions it is
unlikely to complete recombination efficiently. The RecA protein catalyzes unidirectional
branch migration and by doing so makes it possible to complete recombination,
producing a region of heteroduplex DNA that is thousands of base pairs long.
RecA protein is a DNA-dependent ATPase, it contains an additional site for binding and
hydrolyzing ATP. RecA associates more tightly with DNA when it has ATP bound than
when it has ADP bound.
[66]
Figure 3.11: DNA synapsis catalyzed by the RecA protein. In vitro experiments show that
several types of complexes are formed between a DNA single strand covered with RecA
protein (red) and a DNA double helix (green). First a non-base-paired complex is formed,
which is converted to a three-stranded structure as soon as a region of homologous sequence
is found. This complex is presumably unstable because it involves an unusual form of DNA,
and it spins out a DNA heteroduplex (one strand green and the other strand red) plus a
displaced single strand from the original helix (green); thus the structure shown in this diagram
migrates to the left, reeling in the "input DNAs" while producing the "output DNAs." The net
result is a DNA strand exchange identical to that diagrammed earlier in Figure 3.8. (Adapted
from S.C. West, Annu. Rev. Biochem. 61:603-640, 1992. © Annual Reviews Inc.)
Escherichia coli strains deficient in RecA are useful for cloning procedures in molecular
biology laboratories. E. coli strains are often genetically modified to contain a mutant
recA locus to ensure the stability of exogenous plasmids: modular circular dsDNA which
bacteria replicate with their genome during normal cell growth. Plasmid DNA is taken up
by the bacteria under a variety of conditions. Bacteria containing exogenous plasmids are
called "transformants". Transformants retain the plasmid throughout cell divisions. such
that it can be recovered and used in other applications. Without functional RecA protein
the exogenous plasmid DNA is left unaltered by the bacteria. Purification of this plasmid
from bacterial cultures then results in high-fidelity amplification of the original plasmid
sequence.
The RecA Protein Enables a DNA Single Strand to Pair with a
Homologous Region of DNA Double Helix in E. coli 42
General recombination is more complex than the simple hybridization reactions just
described. In the course of general recombination, a single DNA strand from one DNA
double helix must invade another double helix (see Figure 3.8). In E. coli this requires the
RecA protein, produced by the recA gene, which was identified in 1965 as being essential
for recombination between chromosomes. Long sought by biochemists, this elusive gene
product was finally purified tohomogeneity in 1976, a feat that allowed its detailed
characterization (Figure 3.10). Like a singlestrand binding (SSB) protein, the RecA
protein binds tightly and in large cooperative clusters to single-stranded DNA to form a
nucleoprotein filament. This filament has several distinctive properties. The RecA protein
has more than one DNA-binding site, for example, and it can therefore hold a single
strand and a double helix together. These sites allow the RecA protein to catalyze a
multistep reaction (called synapsis) between a DNA double helix and a homologous
[67]
region of single-stranded DNA. The crucial step in synapsis occurs when a region of
homology is identified by an initial base-pairing between complementary nucleotide
sequences. The nucleation step in this case appears to involve a three-stranded structure,
in which the DNA single strand forms nonconventional base pairs in the major groove of
the DNA double helix (Figure 3.11). This begins the pairing shown previously in Figure
3.8 and so initiates the exchange of strands between two recombining DNA double
helices. Studies in vitro suggest that the E. coli SSB protein cooperates with the RecA
protein to facilitate these reactions.
Once synapsis has occurred, a short heteroduplex region where the strands from two
different DNA molecules have begun to pair is enlarged through protein-directed branch
migration, which can also be catalyzed by the RecA protein. Branch migration can take
place at any point
where two single DNA
strands with the same
sequence are attempting
to pair with the same
complementary strand;
an unpaired region of
one of the single strands
will displace a paired
region of the other
single strand, moving
the
branch
point
without changing the
total number of DNA
base pairs. Spontaneous
branch
migration
proceeds equally in
both directions, and so
it makes little progress
and is unlikely to
complete
recombination efficiently (Figure
3.12A). Because the
RecA protein catalyzes
unidirectional branch
migration, it readily
produces a region of Figure 3.12: Two types of DNA branch migration observed in
heteroduplex that is experiments in vitro. (A) Spontaneous branch migration is a backthousands of base pairs and-forth, random-walk type of process, and it therefore makes little
progress over long distances. (B) RecA-protein-directed branch
long (Figure 3.12B).
The catalysis of branch migration proceeds at a uniform rate in one direction, and it may be
by the polarized assembly of the RecA protein filament on a
migration depends on a driven
DNA single strand, which occurs in the direction indicated. In
further property of the addition, special DNA helicases that catalyze protein-directed branch
RecA
protein.
In migration even more efficiently are involved in recombination
addition to having two
[68]
DNA-binding sites, the RecA protein is a DNA-dependent ATPase, with an additional
site for binding and hydrolyzing ATP. The protein associates much more tightly with
DNA when it has ATP bound than when it has ADP bound. Moreover, new RecA
molecules with ATP bound are preferentially added at one end of the RecA protein
filament, and the ATP is then hydrolyzed to ADP. The RecA protein filaments that form
on DNA may therefore share many of the dynamic assembly properties displayed by the
cytoskeletal filaments formed from actin or tubulin; an ability of the protein to
"treadmill" unidirectionally along a DNA strand, for example, could drive the branch
migration reaction shown in Figure 3.12B.
RecBCD
RecBCD, also known as ‘Exonuclease V’, is a protein
of the E. coli bacterium that initiates recombinational
repair from DNA double strand breaks which are a
common result of ionizing radiation, replication
errors, endonucleases, oxidative damage and a host of
other factors. It is both, a helicase that unwinds, or
separates the strands of DNA and a nuclease that
makes single-stranded nicks in DNA.
RecBCD (Figure 3.13) is composed of three different
subunits, encoded by the recB, recC, and recD genes.
Both the RecB and RecD subunits are helicases, i.e.
energy-dependent molecular motors that unwind
Figure 3.13: RecBCD Crystal Structure
DNA or RNA.
RecBCD is unusual amongst helicases in that it recognizes a specific sequence in DNA,
5'-GCTGGTGG-3' that is known as ‘Chi’. After it initiates unwinding, RecBCD makes
nicks on the strand that contains the unwound 3' end. When RecBCD encounters a Chi
site on this strand as it is unwinding DNA, it makes a final nick and pauses. It has been
proposed that this pause is a consequence of a conformational rearrangement in the
protein that changes its properties. When RecBCD resumes unwinding, it now nicks the
opposite strand (i.e. that containing the 5' unwound end). As a consequence, the 3' strand
remains intact downstream of Chi. This is important because the strand exchange protein,
RecA, that is responsible for the next step of recombinational repair needs a single-strand
molecule with a 3' end.
RecBCD is also a model enzyme for the use of single molecule fluorescence as an
experimental technique used to better understand the function of protein-DNA
interactions.
General Genetic Recombination Usually Involves a Cross-Strand
Exchange - Exchanging a single strand between two double helices is presumed to be
the slow and difficult step in a general recombination event (see Figure 3.8). After this
initial exchange, extending the region of pairing and establishing further strand
exchanges between the two closely apposed helices is thought to proceed rapidly. During
these events a limited amount of nucleotide excision and local DNA resynthesis often
occurs, resembling some of the events in DNA repair.
[69]
Because of the large number of possibilities, different organisms
are likely to follow different pathways at this stage. In most
cases, however, an important intermediate structure, the crossstrand exchange, will be formed by the two participating DNA
helices. One of the simplest ways in which this structure can form
is shown in Figure 3.14.
In the cross-strand exchange (also called a Holliday junction) the
two homologous DNA helices that initially paired are held
together by mutual exchange of two of the four strands present,
one originating from each of the helices. No disruption of basepairing is necessary to maintain this structure, which has two
important properties (1) the point of exchange between the two
homologous DNA double helices (where the two strands cross in
Figure 3.14) can migrate rapidly back and forth along the helices
by a double branch migration; (2) the cross-strand exchange
contains two pairs of strands: one pair of crossing strands and one
pair of non-crossing strands. The structure can isomerize,
however, by undergoing a series of rotational movements, so that
the two original non-crossing strands become crossing strands
and vice versa (Figure 3.15). In order to regenerate two separate
DNA helices and thus terminate the pairing process, the two
crossing strands must be cut. If the crossing strands are cut before
isomerization, the two original DNA helices separate from each
other nearly unaltered, with only a very short piece of single
stranded DNA exchanged.
If the crossing strands are cut after isomerization, however, one
section of each original DNA helix is joined to a section of the
other DNA helix; in other words, the two DNA helices have
crossed over (see Figure 3.15).
The isomerization of the cross-strand exchange should occur
spontaneously at some rate, but it may also be enzymatically
driven or otherwise regulated by cells. Some kind of control
probably operates during meiosis, when the two DNA double
helices that pair are constrained in an elaborate structure called
the synaptonemal complex.
Gene Conversion Results from Combining General
Recombination and Limited DNA Synthesis - It is a
fundamental law of genetics that each parent makes an equal
genetic contribution to the offspring, one complete set of genes
being inherited from the father and one from the mother. Thus,
when a diploid cell undergoes meiosis to produce four haploid
cells, exactly half of the genes in these cells should be maternal
(genes that the diploid cell inherited from its mother) and the
other half paternal (genes that the diploid cell inherited from its
father).
Figure 3.14: The formation
of a cross-strand exchange.
There are many possible
pathways that can lead from
a single-strand exchange to
a cross-strand exchange,
but only one is shown.
[70]
Figure 3.15: The isomerization of a cross-strand exchange. Without
isomerization, cutting the two crossing strands would terminate the
exchange and crossing over would not occur. With isomerization (steps
B and C), cutting the two crossing strands creates two DNA molecules
that have crossed over (bottom). Isomerization is therefore thought to
be required for the breaking and rejoining of two homologous DNA
double helices that result from general genetic recombination. Step A
was illustrated previously (see Figure 3.14)
Figure 3.16: One general recombination pathway that can cause
gene conversion. The process begins when a nick is formed in one of
the strands in the red DNA helix. In step 1 DNA polymerase begins
the synthesis of an extra copy of a strand in the red helix, displacing
the original copy as a single strand. This single strand then pairs with
the homologous region of the green helix. In step 2 the short region of
unpaired green strand produced in step 1 is degraded, completing the
transfer of nucleotide sequences. The result is normally seen in the
next cell cycle, after DNA replication has separated the two
nonmatching strands (step 3). As described in the text, the repair of
mismatched base pairs in a heteroduplex joint also causes gene
conversion.
[71]
In a complex animal, such as a human, it is not possible to check this prediction directly.
But in other organisms, such as fungi, where it is possible to recover and analyze all four
of the daughter cells produced from a single cell by meiosis, one finds cases in which the
standard genetic rules have apparently been violated. Occasionally, for example, meiosis
yields three copies of the maternal version of a gene (allele) and only one copy of the
paternal allele, indicating that one of the two copies of the paternal allele has been
changed to a copy of the maternal allele. This phenomenon is known as gene conversion.
It often occurs in association with general genetic recombination events, and it is thought
to be important in the evolution of certain genes. Gene conversion is believed to be a
straightforward consequence of the mechanisms of general recombination and DNA
repair.
During meiosis heteroduplex joints are formed at the sites of crossing-over between
homologous maternal and paternal chromosomes. If the maternal and paternal DNA
sequences are slightly different, the heteroduplex joint may include some mismatched
base pairs. The resulting mismatch in the double helix may then be corrected by the DNA
repair machinery, which either can erase nucleotides on the paternal strand and replace
them with nucleotides that match the maternal strand or vice versa. The consequence of
this mismatch repair will be a gene conversion. Gene conversion can also take place by a
number of other mechanisms, but they all require some type of general recombination
event that brings two copies of a closely related DNA sequence together. Because an
extra copy of one of the two DNA sequences is generated, a limited amount of DNA
synthesis must also be involved. Genetic studies show that usually only small sections of
DNA undergo gene conversion, and in many cases only part of a gene is changed.
Gene conversion can also occur in mitotic cells, but it does so more rarely. As in meiotic
cells, some gene conversions in mitotic cells probably result from a mismatch repair
process operating on heteroduplex DNA. Another likely mechanism in both meiotic and
mitotic cells is illustrated in Figure 3.16.
Mismatch Proofreading Can Prevent Promiscuous Genetic Recombination
Between Two Poorly Matched DNA Sequences - As previously discussed,
general recombination is triggered whenever two DNA strands of complementary
sequence pair to form a heteroduplex joint between two double helices (see Figure 3.14).
Experiments carried out in vitrowith purified RecA protein show that pairing can occur
efficiently even when the sequences of the two DNA strands do not match well - when,
for example, only four out of every five nucleotides on average can form base pairs.
How, then, do vertebrate cells avoid promiscuous general recombination between the
many thousands of copies of closely related DNA sequences that are repeated in their
genomes? Although the answer is not known, studies with bacteria and yeasts
demonstrate that the same mismatch proofreading system that removes replication errors
has the additional role of interrupting genetic recombination events between imperfectly
matched DNA sequences. It has long been known, for example, that homologous genes in
two closely related bacteria, Escherichia coli and Salmonella typhimurium, generally will
not recombine, even though their nucleotide sequences are 80% identical; when the
mismatch proofreading system is inactivated by mutation, however, there is a 1000-fold
increase in the frequency of such interspecies recombination events. It is thought, then,
that the mismatch proofreading system normally recognizes the mispaired bases in an
[72]
Figure 3.17: Proofreading prevents general recombination from
destabilizing genomes that contain repeated sequences. Studies
with bacterial and yeast cells suggest that the mismatch
proofreading system has the additional function shown here.
initial
strand
exchange
and
prevents
the
subsequent
steps
required to break and
rejoin the two paired
DNA helices. This
mechanism protects
the bacterial genome
from the sequence
changes
thatwould
otherwise be caused
by
recombination
with foreign DNA
molecules
that
occasionally enter the
cell.
In vertebrate cells,
which contain many
closely related DNA
sequences, the same
type of proofreading
is thought to help
prevent promiscuous
recombin-ation events
that would otherwise
scramble the genome
(Figure 3.17)
3.2.4 Site Specific Recombination
Site-specific Recombination Enzymes Move Special DNA Sequences into
and out of Genomes - Site-specific genetic recombination, unlike general
recombination, is guided by a recombination enzyme that recognizes specific nucleotide
sequences present on one or both of the recombining DNA molecules. Base-pairing
between the recombining DNA molecules need not be involved, and even when it is, the
heteroduplex joint that is formed is only a few base pairs long. By separating and joining
double-stranded DNA molecules at specific sites, this type of recombination enables
various types of mobile DNA sequences to move about within and between
chromosomes.
Site-specific recombination was first discovered as the means by which a bacterial virus,
bacteriophage lambda, moves its genome into and out of the E. coli chromosome. In its
integrated state the virus is hidden in the bacterial chromosome and replicated as part of
the host's DNA. When the virus enters a cell, a virus-encoded enzyme called lambda
integrase is synthesized. This enzyme catalyzes a recombination process that begins when
several molecules of the integrase protein bind tightly to a specific DNA sequence on the
[73]
circular bacteriophage chromosome. The
resulting DNA-protein complex can now
bind to a related but different specific
DNA sequence on the bacterial
chromosome, bringing the bacterial and
bacteriophage
chromosomes
close
together. The integrase then catalyzes
the required DNA cutting and resealing
reactions, using a short region of
sequence homology to form a tiny
heteroduplex joint at the point of union
(Figure 3.18). The integrase resembles a
DNA topoisomerase in that it forms a
reversible covalent linkage to DNA
wherever it breaks a DNA chain.
The same type of site-specific
recombination mechanism can also be
carried out in reverse by the lambda
bacteriophage, enabling it to exit from
its integration site in the E. coli
chromosome in order to multiply rapidly
within the bacterial cell. This excision
reaction is catalyzed by a complex of the
integrase enzyme with a second
bacteriophage
protein,
which
is
produced by the virus only when its host
cell is stressed. If the sites recognized by
such a recombination enzyme are
flipped, the DNA between them will be Figure 3.18: The insertion of bacteriophage
inverted rather than excised.
lambda DNA into the bacterial chromosome. In
Many other enzymes that catalyze site- this example of site-specific recombination, the
specific recombination resemble lambda lambda integrase enzyme binds to a specific
"attachment site" DNA sequence on each
integrase in requiring a short region of chromosome, where it makes cuts that bracket
identical DNA sequence on the two a short homologous DNA sequence; the
regions of DNA helix to be joined.
integrase thereby switches the partner strands
Because of this requirement, each and reseals them so as to form a heteroduplex
enzyme in this class is fastidious with joint 7 base pairs long. Each of the four strandbreaking and strand joining reactions required
respect to the DNA sequences that it resembles that made by a DNA
recombines, and it can be expected to topoisomerase, inasmuch as the energy of a
catalyze one particular DNA joining cleaved phosphodiester bond is stored in a
event that is useful to the virus, plasmid, transient covalent linkage between the DNA
transposable element, or cell that and the enzyme (see Figure 3.14).
contains it. These enzymes can be
exploited as tools in transgenic animals to study the influence of specific genes on cell
behavior, as illustrated in Figure 3.19. Site-specific recombination enzymes that break
and rejoin two DNA double helices at specific sequences on each DNA molecule often
[74]
Figure 3.20: Transpositional site-specific recombination. (A) Outline of the strand-breaking and
- rejoining events that lead to integration of the linear double-stranded DNA of a retrovirus
(red) into an animal cell chromosome (blue). In an initial endonuclease step the integrase
enzyme makes a cut in one strand at each end of the viral DNA sequence, exposing a
protruding 3'-OH group. Each of these 3'-OH ends then directly attacks a phosphodiester bond
on opposite strands of a randomly selected site on a target chromosome. This inserts the viral
DNA sequence into the target chromosome, leaving short gaps on each side that are filled in
by DNA repair processes. Because of the gap filling, this type of mechanism leaves short
repeats of target DNA sequence [3 to 12 nucleotides in length (black), depending on the
integrase enzyme] on either side of the integrated DNA segment. (B) An atomic-level view of
the attack by one DNA chain end in (A) on a phosphodiester bond of the target DNA (blue).
This mechanism resembles that used in RNA splicing, and is distinctly different from the
topoisomerase-like activity of lambda integrase.
(Adapted from K. Mizuuchi, J. Biol. Chem. 267:21273-21276, 1992.)
do so in a reversible way: as for lambda bacteriophage, the same enzyme system that
joins two DNA molecules can take them apart again, precisely restoring the sequences of
the two original DNA molecules. This type of recombination is therefore called
conservative site-specific recombination to distinguish it from the mechanistically
distinct transpositional site-specific recombination that we discuss next.
Transpositional Recombination Can Insert a Mobile Genetic Element into
Any DNA Sequence - Many mobile DNA sequences, including many viruses and
[75]
transposable elements, encode integrases that insert their DNA into a chromosome by a
mechanism that is different from that used by bacteriophage lambda. Like the lambda
integrase, each of these enzymes recognizes a specific DNA sequence in the particular
mobile genetic element whose recombination it catalyzes. Unlike the lambda enzyme,
however, these integrases do not require a specific DNA sequence in the "target"
chromosome and they do not form a heteroduplex joint. Instead, they introduce cuts into
both ends of the linear DNA sequence of the mobile genetic element and then catalyze a
direct attack by these DNA ends on the target DNA molecule, breaking two closely
spaced phosphodiester bonds in the target molecule. Because of the way that these breaks
are made, two short single-stranded gaps are left in the recombinant DNA molecule, one
at each end of the mobile element; these are filled in by DNA polymerase to complete the
recombination process. As illustrated in Figure 3.20, this mechanism creates a short
duplication of the adjacent target DNA sequence; such flanking duplications are the
hallmark of a transpositional site-specific recombination event. An integrase enzyme of
this type was first purified in active form from bacteriophage Mu. Like the bacteriophage
lambda integrase, it carries out all of its cutting and rejoining reactions without requiring
an energy source (such as ATP). Very similar enzymes are present in organisms as
diverse as bacteria, fruit flies, and humans - all of which contain mobile genetic elements,
as we discuss next.
3.2.5 Chromosome Mapping Linkage Groups And Genetic
Markers
Definitions Used in Genetic Mapping - Why do geneticists indulge in mapping?
The answer to that question depends largely on the sorts of mapping that are employed
because each level or type of mapping can answer certain questions. Because of this, we
will treat the rationale behind mapping in two ways, one for gross Mapping and one for
"fine structure" mapping. In either case, DNA is transferred into a recipient cell under
conditions where there is a selection for the stable inheritance of the incoming DNA.
Typically this involves a selection for recombination of the incoming DNA with a
replicon in the recipient.
One performs gross mapping if one's intentions are to either place the marker of interest
somewhere on a chromosomal map, or to find out any other relevant or irrelevant
markers that happen to be genetically linked. This sort of mapping is often reported in the
literature but, in general, it does not really tell you very much. Arguably, it just sets up
the system for future strain constructions, allows preliminary genetic analysis of other
mutations, helps in the construction of either R-primes or F-primes for complementation
analysis, and allows some sort of comparison to genetically similar systems. For
example, if you knew you had three loci (with a particular phenotype) and showed that
they were each linked to a different selectable marker and unlinked to each other, then
you have answered nearly all the interesting questions that can be addressed with this
level of genetic analysis.
It is now becoming possible to do gross mapping physically. This has required the
identification of restriction enzymes that cut very rarely (<20 times per genome) and the
development of an electrophoresis system, orthogonal field electrophoresis, capable of
[76]
resolving very large DNA fragments. The localization of a gene to a given fragment,
using physical or genetic methods, provides gross, physical mapping information.
The goal of fine structure mapping is to order mutations, which are known to map in a
given small region, into a one-dimensional array. This array actually says little about
physical distance between the mutations, but a comparison of the order of mutations with
the phenotypes that they cause allows strong statements to be made about the
organization of the genetic system. Physical mapping can also order mutations and
provide that ordering with actual physical distances. Properly, this array should be
ordered with respect to other external markers. This ordering will allow you to make
sense of your complementation data (you can then tell polarity from allelism); it allows
the "clustering" of phenotypes that, in conjunction with complementation, helps define
genes and gene functions; when performed in conjunction with "reversion analysis", it
helps confirm that the mutation you are dealing with is a single and not a double
mutation. Increasingly, the fine structure analysis of DNA is the only form of mapping of
interest to molecular biologists, and deletion mapping is the best way to genetically
perform such mapping. As sequencing methodology has become ever more rapid, it is
becoming reasonable to map by sequencing, thus providing a physical reality to the
mutation order. On the other hand, while mapping itself is becoming less relevant, the
concept of linkage remains important and will be the focus of this section of the text.
It is important to understand the difference between mapping and complementation.
Complementation is a test of function. It asks the question if two separate regions of
DNA can supply all required functions for an apparently wild-type phenotype. Mapping
is a test of sequence. It asks if, and with what frequency, two non-identical versions of
the same genetic region are capable of recombining to generate a wild-type sequence.
Complementation is therefore best analyzed in the absence of recombination while
mapping typically demands recombinational events. Your mapping analyses have to be
so devised that you can select for a phenotype that requires one or more recombinational
events.
A term that is used with great frequency in discussions of mapping is linkage. Linkage is
defined as the frequency with which two sites (a site can either be the site of a mutation
or the site of the wild-type version of the mutation) on a piece of DNA are co-inherited
using a particular gene transfer system. As such, it is a function of two variables: (1) The
frequency with which the two sites are brought into the same cell by that particular gene
transfer system (termed "end effects" in some of the following sections, in reference to
the "ends" of the transferred DNA) and (2) the frequency with which they are both
recombined into the chromosome. Another statement of the latter point is that, for linkage
to be observed, the recombination events occur "outside" each site and not between them.
Ignoring end effects, linkage is inversely proportional to the likelihood of a
recombinational event occurring between two sites and (since recombination events are
random and their likelihood increases with the increasing size of homologous regions
available for recombination) therefore, to the distance between the sites:
The product strain (the genotypically altered recipient) of a recombinational event is
often referred to as a recombinant.
Genetic mapping also makes the assumption that there is only one piece of DNA
exchanged between the two organisms. Thus it is assumed that if two markers enter a
recipient cell, they must be on the same piece of DNA and they therefore must be
[77]
"linked" in that gene transfer system. If one utilizes a gene transfer system where more
than one distinct piece of DNA can enter the same cell, one of the assumptions used in
mapping is violated and problems in interpretation can occur since the apparent linkage
would reflect the frequency of the two markers entering the same cell separately and not
the genetic distance between them. This latter case can occur in either transformation or
in generalized transduction with the highly efficient transducing phage P22HT, since
these two systems are so efficient at moving DNA into a recipient that it is quite possible
to get more than one piece of DNA into a given recipient. Such a phenomenon is known
as congression.
3.2.6 Corelation Between Genetic Mapping And Physical
Mapping
Mapping By "Prime Complementation" - If one had in hand a set of primes that
carried the entire bacterial chromosome, one could mate them, one at a time, into a
recipient with a particular mutation and look for the correction of that mutation by
complementation or recombination. Presumably, only the primes carrying the region
mutated in the recipient would be capable of correcting the mutant phenotype. If one
thinks about this a bit, it is clear that this form of mapping is a version of deletion
mapping where the majority of the chromosome is deleted. It can also be performed with
smaller cloned regions on any replicating plasmid. This system works because most
mutations cause loss of function and are therefore recessive to wild-type. This approach
would fail in an attempt to map trans-dominant mutations.
Physical Mapping - As noted at the start, it is becoming possible to cut an entire
bacterial chromosome into a relatively few pieces (typically with restriction enzymes
with unusually large, and therefore very infrequent, target sites) and then to identify the
fragment that hybridizes to any cloned piece of DNA. Since the "marker" used is a
hybridization probe, this allows mapping of regions of hybridization and rather than
mutations, in contrast to genetic mapping. The physical mapping of a transposon
insertion does both, however, because the hybridizing region is the mutation. When this
approach has been performed on organisms with a preexisting body of genetic
information available (e.g. E. coli), a very powerful genetic/physical composite map is
generated. On the other hand, it is unclear to this observer, at least, how such information
is of particular use in understanding organisms that lack such a history of genetic
characterization, since it simply locates the cloned region on a vast featureless piece of
DNA. This approach will certainly become easier as more "rarely-cutting" restriction
enzymes become available and as tools are developed to introduce unique restriction sites
into genomes.
Final Notes on Mapping - The problem of "signal-to-noise ratio", alluded to in the
section on deletion mapping, is an important point. It should be remembered that most
point mutations will revert at a reasonable frequency and for many mapping systems,
these revertants will confuse the results and lower the potential resolution of the mapping
system.
[78]
It should be reemphasized that genetic mapping, and particularly deletion mapping,
establishes genetic, and not physical, distances (though rough estimates are possible
through use of certain numerical analyses). Just because one finds more mutations in a
particular region of a gene, as defined by deletion mapping, one should not assume that
region is large. It could simply be that region of the protein is critical, so that a
disproportionate fraction of point mutations are detected there. Consequently, this allows
you to separate the region into more deletion intervals because the end points of more
deletions are separable.
3.3 MUTATIONS
In the living cell, DNA undergoes frequent chemical change, especially when it is being
replicated (in S phase of the eukaryotic cell cycle). Most of these changes are quickly
repaired. Those that are not result in a mutation. Thus, mutation is a failure of DNA
repair.
3.3.1 Spontaneous & Induced Mutations
Spontaneous Replication Errors - Replication is amazingly accurate: fewer than
one in a billion errors are made in the course of DNA synthesis. However, spontaneous
replication errors do occasionally occur. The primary cause of spontaneous replication
errors was formerly thought to be tautomeric shifts, in which the positions of protons in
the DNA bases change. Purine and pyrimidine bases exist in different chemical forms
called tautomers (Figure 3.21). The two tautomeric forms of each base are in dynamic
equilibrium, although one form is more common than the other. The standard Watson and
Crick base pairings—adenine with thymine, and cytosine with guanine—are between the
common forms of the bases, but, if the bases are in their rare tautomeric forms, other base
pairings are possible (Figure 3.22). Watson and Crick proposed that tautomeric shifts
might produce mutations, and for many years their proposal was the accepted model for
spontaneous replication errors, but there has never been convincing evidence that the rare
tautomers are the cause of spontaneous mutations. Furthermore, research now shows little
evidence of these structures in DNA. Mispairing can also occur through wobble, in which
normal, protonated, and other forms of the bases are able to pair because of flexibility in
the DNA helical structure (Figure 3.22). These structures have been detected in DNA
molecules and are now thought to be responsible for many of the mispairings in
replication. When a mismatched base has been incorporated into a newly synthesized
nucleotide chain, an incorporated error is said to have occurred. Suppose that, in
replication, thymine (which normally pairs with adenine) mispairs with guanine through
wobble (Figure 3.23). In the next round of replication, the two mismatched bases
separate, and each serves as template for the synthesis of a new nucleotide strand. This
time, thymine pairs with adenine, producing another copy of the original DNA sequence.
On the other strand, however, the incorrectly incorporated guanine serves as the template
and pairs with cytosine, producing a new DNA molecule that has an incorporated error a C_G pair in place of the original T_A pair (a T_A:C_G base substitution). The original
incorporated error leads to a replication error, which creates a permanent mutation,
[79]
because all the base pairings are correct and there is no mechanism for repair systems to
detect the error. Mutations due to small insertions and deletions also may arise
spontaneously in replication and crossing over. Strand slippage may occur when one
nucleotide strand forms a small loop (Figure 3.24).
Figure 3.21: Purine and pyrimidine bases
exist in different forms called tautomers. (a)
A tautomeric shift occurs when a proton
changes its position, resulting in a rare
tautomeric form. (b) Standard and
anomalous base-pairing arrangements
occur if bases are in the rare tautomeric
forms. Base mispairings due to tautomeric
shifts were originally thought to be a major
source of errors in replication, but such
structures have not been detected in DNA,
and most evidence now suggests that
other types of anomalous pairings are
responsible for replication errors.
Figure 3.22: Nonstandard base pairings can
occur as a result of the flexibility in DNA
structure. Thymine and guanine can pair
through wobble between normal bases.
Cytosine and adenine can pair through
wobble when adenine is protonated (has an
extra hydrogen).
[80]
Figure 3.23: Wobble base pairing leads to a replicated error.
If the looped-out nucleotides are on the newly synthesized strand, an insertion results. At
the next round of replication, the insertion will be incorporated into both strands of the
DNA molecule. If the looped-out nucleotides are on the template strand, then there is a
deletion on the newly replicated strand, and this deletion will be perpetuated in
subsequent rounds of replication. During normal crossing over, the homologous
sequences of the two DNA molecules align, and crossing over produces no net change in
the number of nucleotides in either molecule. Misaligned pairing may cause unequal
crossing over, which results in one DNA molecule with an insertion and the other with a
deletion (Figure 3.25).
Some DNA sequences are more likely than others to undergo strand slippage or un-equal
crossing over. Stretches of repeated sequences, such as trinucleotide repeats or
homopolymeric repeats (more than five repeats of the same base in a row), are prone to
strand slippage. Stretches with more repeats are more likely to undergo strand slippage.
Duplicated or repetitive sequences may misalign during pairing, leading to unequal
crossing over. Both strand slippage and unequal crossing over produce duplicated copies
of sequences, which in turn promote further strand slippage and unequal crossing over.
This chain of events may explain the phenomenon of anticipation often observed for
expanding trinucleotide repeats.
Spontaneous Chemical Changes - In addition to spontaneous mutations that arise
in replication, mutations also result from
Figure 3.24: Insertions and deletions may
result from strand slippage.
spontaneous chemical changes in DNA. One
such change is depurination, the loss of a
Figure 3.25: Unequal crossing over
produces insertions and deletions.
[81]
purine base from a nucleotide. Depurination results when the covalent bond connecting
Figure 3.26: Depurination, loss of a purine base from the nucleotide, produces an apurinic site.
Figure
3.27 :
Deamination
alters
DNA
bases.
the purine to the 1_-carbon atom of the deoxyribose sugar breaks (Figure 3.26),
producing an apurinic site - a nucleotide that lacks its purine base. An apurinic site cannot
act as a template for a complementary base in replication.
In the absence of base-pairing constraints, an incorrect nucleotide (most often adenine) is
incorporated into the newly synthesized DNA strand opposite the apurinic site (Figure
3.26b), frequently leading to an incorporated error. The incorporated error is then
transformed into a replication error at the next round of replication. Depurination is a
common cause of spontaneous mutation; a mammalian cell in culture loses
approximately 10,000 purines every day. Another spontaneously occurring chemical
change that takes place in DNA is deamination, the loss of an amino group (NH2) from a
base. Deamination may occur spontaneously or be induced by mutagenic chemicals.
A brief history of Herman Muller - The first discovery of a chemical mutagen was
made by Charlotte Auerbach, who was born in Germany to a Jewish family in 1899.
After attending university in Berlin and doing research, she spent several years teaching
at various schools in Berlin. Faced with increasing anti-Semitism in Nazi Germany,
Auerbach immigrated to Britain, where she conducted research on the development of
mutants in Drosophila. There she met Herman Muller, who had shown that radiation
[82]
Figure 3.28: 5-Bromouracil (a base analog) resembles thymine, except that it has a bromine atom in
place of a methyl group on the 5-carbon atom. Because of the similarity in their structures, 5-bromouracil may be incorporated into DNA in place of thymine. Like thymine, 5-bromouracil normally pairs
with adenine but, when ionized, it may pair with guanine through wobble.
induces mutations; he suggested that Auerbach try to obtain mutants by treating
Drosophila with chemicals. Her initial attempts met with little success. Other scientists
were conducting top-secret research on mustard gas (used as a chemical weapon in World
War I) and noticed that it produced many of the same effects as radiation. Auerbach was
asked to determine whether mustard gas was mutagenic. Collaborating with
pharmacologist J. M. Robson, Auerbach studied the effects of mustard gas on Drosophila
melanogaster. The experimental conditions were crude.
They heated liquid mustard gas over a Bunsen burner on the roof of the pharmacology
Figure
3.29:
Chemicals may alter
DNA bases. (a) The
alkylating agent ethylmethanesulfonate
(EMS) adds an ethyl
group to guanine,
producing
6-ethylguanine, which pairs
with
thymine,
producing a C_G:T_A
transition mutation. (b)
Nitrous
acid
deaminates cyto-sine
to
produce
uracil,
which pairs with adenine,
producing
a
C_G:T_A
transition
mutation. (c) Hydroxylamine
converts
cytosine
into
hydroxylaminocytosine,
which
frequently pairs with
adenine, leading to a
C_G:T_A
transition
mutation.
[83]
building, and the flies were exposed to the gas in a large chamber. After developing
serious burns on her hands from the gas, Auerbach let others carry out the exposures, and
she analyzed the flies. Auerbach and Robson showed that mustard gas is indeed a
powerful mutagen, reducing the viability of gametes and increasing the numbers of
mutations seen in the offspring of exposed flies. Because the research was part of the
secret war effort, publication of their findings was delayed until 1947.
Chemically Induced Mutations - Although many mutations arise spontaneously, a
number of environmental agents are capable of damaging DNA, including certain
chemicals and radiation. Any environmental agent that significantly increases the rate of
mutation above the spontaneous rate is called a mutagen.
(1) Base analogs - one class of chemical mutagens consists of base analogs, chemicals
with structures similar to that of any of the four standard bases of DNA. DNA
polymerases cannot distinguish these analogs from the standard bases; so, if base analogs
are present during replication, they may be incorporated into newly synthesized DNA
molecules. For example, 5-bromouracil (5BU) is an analog of thymine; it has the same
structure as that of thymine except that it has a bromine (Br) atom on the 5-carbon atom
instead of a methyl group (Figure 3.28a).
Normally, 5-bromouracil pairs with adenine just as thymine does, but it occasionally
Figure 3.31: 5-Bromouracil
can lead to a replicated
error.
mispairs with guanine (Figure 3.28b), leading to a transition (T_A:5BU_A:5BU_G:C_G),
as shown in (Figure 3.31). Through mispairing, 5-bromouracil may also be incorporated
into a newly synthesized DNA strand opposite guanine. In the next round of replication,
5-bromouracil may pair with adenine, leading to another transition
(G_C:G_5BU:A_5BU:A_T).
(2) 2-aminopurine (2AP) - Another mutagenic chemical is, which is a base analog of
adenine. Normally, 2-aminopurine base pairs with thymine, but it may mispair with
cytosine, causing a transition mutation (T_A:T_2AP:C_2AP:C_G). Alternatively, 2aminopurine may be incorporated through mispairing into the newly synthesized DNA
[84]
opposite cytosine and later pair with thymine, leading to a C_G:C_2AP:T_2AP:T_A
transition.
Thus, both 5-bromouracil and 2-aminopurine can produce transition mutations. In the
laboratory, mutations by base analogs can be reversed by treatment with the same analog
or by treatment with a different analog.
(3) Alkylating agents - Alkylating agents are chemicals that donate alkyl groups. These
agents include methyl (CH3) and ethyl (CH3–
CH2) groups, which are added to nucleotide
bases by some chemicals. For example,
ethylmethanesulfonate (EMS) adds an ethyl
group to guanine, producing 6-ethylguanine,
which pairs with thymine (see Figure 3.29a).
Thus, EMS produces C_G:T_A transitions.
EMS is also capable of adding an ethyl group to
thymine, producing 4-ethylthymine, which then
pairs with guanine, leading to a T_A:C_G
transition. Because EMS produces both Figure 3.30: Oxidative radicals convert guanine
C_G:T_A and T_A:C_G transitions, into 8-oxy-7,8-dihydrodeoxyguanine, which
frequently mispairs with adenine instead of
mutations produced by EMS can be reversed cytosine, producing a C_G:T_A transversion.
by additional treatment with EMS. Mustard
gas is another alkylating agent.
(4) Deamination - In addition to its spontaneous occurrence (see Figure 3.27),
deamination can be induced by some
chemicals. For instance, nitrous acid
deaminates cytosine, creating uracil, which
in the next round of replication pairs with
adenine (see Figure 3.29b), producing a
C_G:T_A transition mutation. Nitrous acid
changes adenine into hypoxanthine, which
pairs with cytosine, leading to a T_A:C_G
transition. Nitrous acid also deaminates
guanine, producing xanthine, which pairs
with cytosine just as guanine does; however
xanthine may also pair with thymine,
leading to a C_G:T_A transition. Nitrous
acid produces exclusively transition
mutations and, because both C_G:T_A and
T_A:C_G transitions are produced, these
mutations can be reversed with nitrous acid.
Figure 3.32: Intercalating agents such as
Hydroxylamine Hydroxylamine is a very proflavin and acridine orange insert
specific basemodifying mutagen that adds a themselves between adjacent bases in DNA,
hydroxyl group to cytosine, converting it distorting the three-dimensional structure of
the helix and causing single-nucleotide
into
hydroxylaminocytosine.
This insertions and deletions in replication.
conversion increases the frequency of a rare
[85]
tautomer that pairs with adenine instead of guanine and leads to C_G:T_A transitions.
Because hydroxylamine acts only on cytosine, it will not generate T_A:C_G transitions;
thus, hydroxylamine will not reverse the mutations that it produces.
(5) Oxidative reactions - Reactive forms of oxygen (including superoxide radicals,
hydrogen peroxide, and hydroxyl radicals) are produced in the course of normal aerobic
metabolism, as well as by radiation, ozone, peroxides, and certain drugs. These reactive
forms of oxygen damage DNA and induce mutations by bringing about chemical changes
to DNA. For example, oxidation converts guanine into 8-oxy-7,8-dihydrodeoxyguanine
(Figure 3.30), which frequently mispairs with adenine instead of cytosine, causing a
G_C:T_A transversion mutation.
(6) Intercalating agents - Intercalating agents, such as proflavin, acridine orange,
ethidium bromide, and dioxin are about the same size as a nucleotide (Figure 3.32a).
They produce mutations by sandwiching themselves (intercalating) between adjacent
bases in DNA, distorting the three-dimensional structure of the helix and causing singlenucleotide insertions and deletions in replication (Figure 3.32b).
These insertions and deletions frequently produce frameshift mutations (which change all
amino acids downstream of the mutation), and so the mutagenic effects of intercalating
agents are often severe. Because intercalating agents generate both additions and
deletions, they can reverse the effects of their own mutations.
Radiation - In 1927, Herman Muller
demonstrated that mutations in fruit flies could
be induced by X-rays. The results of subsequent
studies showed that X-rays greatly increase
mutation rates in all organisms. The high
energies of X-rays, gamma rays, and cosmic
rays (Figure 3.33) are all capable of penetrating
tissues and damaging DNA. These forms of
radiation, called ionizing radiation, dislodge
electrons from the atoms that they encounter,
changing stable molecules into free radicals and
reactive ions, which then alter the structures of
bases and break phosphodiester bonds in DNA.
Ionizing radiation also frequently results in
double-strand breaks in DNA. Attempts to
repair these breaks can produce chromosome
mutations.
Ultraviolet light has less energy than that of
ionizing radiation and does not eject electrons
and cause ionization but is nevertheless highly
mutagenic. Purine and pyrimidine bases readily
absorb UV light, resulting in the formation of
chemical bonds between adjacent pyrimidine
molecules on the same strand of DNA and in
the creation of structures called pyrimidine
dimmers (Figure 3.34a). Pyrimidine dimers
Figure 3.33: In the electromagnetic spectrum,
as wavelength decreases, energy increases.
[86]
consisting of two thymine bases (called thymine dimers) are most frequent, but cytosine
dimmers and thymine–cytosine dimers also can form. These dimmers distort the
configuration of DNA (Figure 3.34b) and often block replication. Most pyrimidine
dimers are immediately repaired by mechanisms discussed later in this chapter, but some
escape repair and inhibit replication and transcription. When pyrimidine dimers block
replication, cell division is inhibited and the cell usually dies; for this reason, UV light
kills bacteria and is an effective sterilizing agent. For a mutation—a hereditary error in
the genetic instructions - to occur, the replication block must be overcome. How do
bacteria and other organisms replicate despite the presence of thymine dimers? Bacteria
can circumvent replication blocks produced by pyrimidine dimers and other types of
DNA damage by means of the SOS system. This system allows replication blocks to be
overcome, but in the process makes numerous mistakes and greatly increases the rate of
mutation. Indeed, the very reason that replication can proceed in the presence of a block
is that the enzymes in the SOS system do not strictly adhere to the base-pairing rules. The
trade-off is that replication may continue and the cell survives, but only by sacrificing the
normal accuracy of DNA synthesis. The SOS system is complex, including the products
of at least 25 genes. A protein called RecA binds to the damaged DNA at the blocked
replication fork and becomes activated.
This activation promotes the binding of a protein called LexA, which is a repressor of the
SOS system. The activated RecA complex induces LexA to undergo self-cleavage,
destroying its repressive activity. This inactivation enables other SOS genes to be
Figure 3.34: Pyrimidine
dimers result from
Ultraviolet light.
(a) Formation of thymine
dimer.
(b) Distorted DNA.
expressed, and the products of these genes allow replication of the damaged DNA to
proceed. The SOS system allows bases to be inserted into a new DNA strand in the
absence of bases on the template strand, but these insertions result in numerous errors in
the base sequence.
Eukaryotic cells have a specialized DNA polymerase called polymerase η (eta) that
bypasses pyrimidine dimers.
Polymerase η preferentially inserts AA opposite a pyrimidine dimer. This strategy seems
to be reasonable because about two-thirds of pyrimidine dimers are thymine dimers.
However, the insertion of AA opposite a CT dimer results in a C_G:A_T transversion.
Polymerase η is therefore said to be an error-prone polymerase.
[87]
Transposons and Mutations - Transposons are mutagens. They can cause mutations
in several ways:
 If a transposon inserts itself into a functional gene, it will probably damage it.
Insertion into exons, introns, and even into DNA flanking the genes (which may
contain promoters and enhancers) can destroy or alter the gene's activity.
 Faulty repair of the gap left at the old site (in cut and paste transposition) can lead
to mutation there.
 The presence of a string of identical repeated sequences presents a problem for
precise pairing during meiosis. How is the third, say, of a string of five Alu
sequences on the "invading strand" of one chromatid going to ensure that it pairs
with the third sequence in the other strand? If it accidentally pairs with one of the
other Alu sequences, the result will be an unequal crossover - one of the
commonest causes of duplications (Figure 3.35).
SINEs (mostly Alu sequences) and LINEs cause only a small percentage of human
mutations. (There may even be a mechanism by which they avoid inserting themselves
into functional genes.) However, they have been found to be the cause of the mutations
responsible for some cases of human genetic diseases, including:
 Hemophilia A (Factor VIII gene) and Hemophilia B [Factor IX gene]
 X-linked severe combined immunodeficiency (SCID) [gene for part of the IL-2
receptor]
 porphyria
 predisposition to colon polyps and cancer [APC gene]
 Duchenne muscular dystrophy [dystrophin gene]
Deletion generation: Both deletion and inversion events next to the transposon are
frequent. The end points of both the deletions and the inversions seem to be non-random
and in the case of inversions, there is typically a second copy of the transposon at either
end of the inverted region.
Element deletion: Tn3 does not
appear to be deleted precisely at
a detectable frequency. Even in
those few cases where revertants to a wildtype phenotype occur, subsequent analysis has
shown that the wild-type
genotype has not been restored. Figure 3.35: A Transposons showing Normal and Alternative Splicing.
Such events that might restore
a wild-type phenotype without the wild-type genotype will be discussed in the section on
suppressors near the end of the text. For another transposon of this class, Tn101,
reversion to a wild-type genotype has been shown to occur at 10-11, which is essentially
undetectable.
A transposable Bacteriophage: Mu - We will briefly treat a number of the same
properties discussed above for bacteriophage Mu (named for its "mutator" effects). For
all intents and purposes it belongs in the general category of a Class 2 transposon since it
is not flanked by separately transposable insertion elements. Its physical size is 38 kb and
[88]
it generates 5 base pair duplications upon insertions. It produces an 11 base pair inverted
repeat at either end. Its site preference is remarkably random and the argument has been
made that its specificity can be for no more than one or two base pairs. However, in at
least one particular region, it has been found that a disproportionate number of insertions
fall within one very small region of the gene suggesting that there can be some site
preference. Mu is rather strongly polar in both orientations, but it is clear that there is an
exceedingly low level of transcription out of one end of the prophage. The transposition
of Mu is known to generate deletions as roughly l0% of the Mu prophages have adjacent
deletions. These deletions tend to start at one end or the other of the prophage and extend
into the adjoining DNA though there also seem to be cases where the deletions are
unlinked to the prophage. Finally, precise deletion of Mu is rather rare, occurring at
approximately 10-9, and seems to be dependent upon at least some Mu factors. The
advantages of the use of Mu are: it is not normally found in the bacterial genome and
therefore there are few problems with homology to existing sequences in the
chromosome; in contrast to most other transposons, Mu does not need a separate vector
system since it is itself a vector, being a bacteriophage; Mu prophage (at least the cts
versions, where c encodes the repressor) are inducible. The disadvantage of Mu is that it
is a bacteriophage and therefore can kill the host cell. A wide variety of useful mutants of
Mu have been generated.
What good are transposons?- We don't know. They have been called "junk" DNA
(because there is no obvious benefit to their host) and "selfish" DNA (because their only
function seems to make more copies of themselves). "Because of the sequence
similarities of all the LINEs and SINEs, they also make up a large portion of the
"repetitive DNA" of the cell. Retrotransposons cannot be so selfish that they reduce the
survival of their host. Perhaps, they even confer some benefit.
 Some possibilities: Retrotransposons often carry some additional sequences at
their 3' end as they insert into a new location. Perhaps these occasionally create
new combinations of exons, promoters, and enhancers that benefit the host.
Example:
o Thousands of our Alu elements occur in the introns of structural genes.
o Some of these contain sequences that when transcribed into the primary
transcript are recognized by the spliceosome.
o These can then be spliced into the mature mRNA creating a
o new exon, which will be transcribed into a new protein product.
o Alternative splicing can provide not only the new mRNA (and thus
protein) but also the old.
o In this way, nature can try out new proteins without the risk of abandoning
the tried-and-true old one.
 L1 elements inserted into the introns of functional genes reduce the transcription
of those genes without harming the gene product — the longer the L1 element,
the lower the level of gene expression. Some 79% of our genes contain L1
elements, and perhaps they are a mechanism for establishing the baseline level of
gene activity.
 Telomerase, the enzyme essential for maintaining chromosome length, is closely
related to the reverse transcriptase of LINEs and may have evolved from it.
[89]

RAG-1 and RAG-2. The proteins encoded by these genes are needed to assemble
the repertoire of antibodies and T-cells receptors (TCRs) used by the adaptive
immune system. The mechanism resembles that of the cut and paste method of
Class II transposons, and the RAG genes may have evolved from them.
If so, the event occurred some 450 million years ago when the jawed vertebrates evolved
from jawless ancestors. Only jawed vertebrates have an adaptive immune system and the
RAG-1 and RAG-2 genes that make it possible.
Transposons and the C-value Paradox 
In Drosophila, the insertion of transposons into genes has been linked to the
development of resistance to DDT and organophosphate insecticides.
 The genome of Arabidopsis thaliana contains 1.2 x 108 base pairs (bp) of DNA.
About 14% of this consists of transposons; the rest functional genes (about 25,000
of them).
 The maize (corn) genome contains 20 times more DNA (2.4 x 109 bp) but surely
has no need for 20 times as many genes. In fact, 60% of the corn genome is made
up of transposons. (The figure for humans is 44%.)
 Most of the 2.5 x 1011 bp of DNA in the genome of Psilotum nudum is
presumably "junk" DNA.
So it seems likely that the lack of an association between size of genome and number of
functional genes - the C-value paradox - is caused by the amount of transposon DNA
accumulated in the genome.
Insertion
Sequences (IS) These are simple
transposons found in
bacteria that only
encode
insertion
functions.
More
complex transposons
may also carry genes
coding
for
other
functions.
In bacterial plasmids,
transposons
often
include
genes
encoding
antibiotic
resistance. Thus, both
transfer
of
the
plasmid from cell to Figure 3.36: Using a Transposon to mutagenize bacteria.
cell
and
the
movement of transposons from one piece of DNA to another can spread the resistance
genes, particularly when a strong genetic selection is present (for instance, when animals
or humans ingest antibiotics).
[90]
When transposons are used in a mutagenesis, as shown in (Figure 3.36), the antibiotic
resistance genes are highly useful as selectable markers in tracking the transposon. A
transposon carrying an antibiotic resistance gene is introduced into the cell on a vector,
such as the delivery vector, that has been engineered so that it cannot persist in the cell;
therefore, any resulting antibiotic-resistant colonies have the transposon integrated
somewhere in their chromosome. One can select for
antibiotic resistance and also screen or select for the
mutation desired. Because a large amount of DNA is being
inserted into a gene, mutagenesis with an insertion element
will usually disrupt gene function.
Bacteriophage Mu is both a phage and a transposon, named
Mu because of the high frequency of mutations associated
with its growth in cells. The mutations are a direct
consequence of part of its life cycleintegrating randomly in
the bacterial DNA as part of both its lysogenic and lytic
replication cycle.
The frequency of transposition of most transposons is low;
there are usually hundreds rather than millions of events,
all over the chromosome. Identifying the one that gives the
desired phenotype may be difficult. In addition,
transposons do not always integrate randomly into the
genome and the degree of specificity varies with the
transposon.
Mutations in the transposase can increase the frequency of
transposition and help randomize insertions (Kleckner,
Bender, and Gottesman, 1991). Furthermore, transposition
can also be carried out in vitro at high efficiency into a
target piece of DNA, and the mutagenized DNA then
inserted back into the cell (Hamer et al., 2001).
Thus far, we have discussed the various kinds of ways to
disrupt gene activity, using mutagens and transposons.
Once we have made mutants, how do we detect them?
3.3.5 DNA Dmage, Repair Mechanism &
Defect On Repair Mechanism
Importance - DNA in the living cell is subject to
many chemical alterations (a fact often forgotten in the
excitement of being able to do DNA sequencing on
dried and/or frozen specimens. If the genetic
information encoded in the DNA is to remain
uncorrupted, any chemical changes must be corrected
(Figure 3.37).
Figure 3.37: DNA repair. The three
steps common to most types of repair
are excision (step 1), resynthesis (step
2), and ligation (step 3). In step 1 the
damage is excised; in steps 2 and 3
the original DNA sequence is restored.
DNA polymer-ase fills in the gap
created by the excision events, and
DNA ligase seals the nick left in the
repaired strand. Nick sealing consists
of the reformation of a broken
phosphodiester bond.
[91]
A failure to repair DNA produces a mutation - The recent publication of the
human genome has already revealed 130 genes whose products participate in DNA
repair. More will probably be identified soon.
Agents that Damage DNA - Certain wavelengths of radiation ionizing radiation
such as gamma rays and x-rays, Ultraviolet rays, especially the UV-C rays (~260 nm)
that are absorbed strongly by DNA but also the longer-wavelength UV-B that penetrates
the ozone shield.
Highly-reactive oxygen radicals produced during normal cellular respiration as well as by
other biochemical pathways.
Chemicals in the environment many hydrocarbons, including some found in cigarette
smoke some plant and microbial products, e.g. the aflatoxins produced in moldy peanuts,
chemicals used in chemotherapy, especially chemotherapy of cancers.
Types of DNA Damage - All four of the bases in DNA (A, T, C and G) can be
covalently modified at various positions. One of the most frequent is the loss of an amino
group ("Deamination") — resulting, for example, in a C being converted to a U.
Mismatches of the normal bases because of a failure of proofreading during DNA
replication. Common example: incorporation of the pyrimidine U (normally found only
in RNA) instead of T.
Breaks in the backbone can be limited to one of the two strands (a single-stranded break,
SSB) or on both strands (a double-stranded break (DSB).
Ionizing radiation is a frequent cause, but some chemicals produce breaks as well.
Crosslinks Covalent linkages can be formed between bases on the same DNA strand
("intrastrand") or on the opposite strand ("interstrand"). Several chemotherapeutic drugs
used against cancers crosslink DNA.
Repairing Damaged Bases






Damaged or inappropriate bases can be repaired by several mechanisms:
Direct chemical reversal of the damage
Excision Repair, in which the damaged base or bases are removed and then
replaced with the correct ones in a localized burst of DNA synthesis. There are
three modes of excision repair, each of which employs specialized sets of
enzymes.
Base Excision Repair (BER)
Nucleotide Excision Repair (NER)
Mismatch Repair (MMR)
Direct Reversal of Base Damage - Perhaps the most frequent cause of point
mutations in humans is the spontaneous addition of a methyl group (CH3-) (an example
of alkylation) to Cs followed by deamination to a T. Fortunately, most of these changes
are repaired by enzymes, called glycosylases, that remove the mismatched T restoring the
correct C. This is done without the need to break the DNA backbone (in contrast to the
mechanisms of excision repair described below).
[92]
Some of the drugs used in cancer chemotherapy ("chemo") also damage DNA by
alkylation. Some of the methyl groups can be removed by a protein encoded by our
MGMT gene. However, the protein can only do it once, so the removal of each methyl
group requires another molecule of protein.
This illustrates a problem with direct reversal mechanisms of DNA repair: they are quite
wasteful. Each of the myriad types of chemical alterations to bases requires its own
mechanism to correct. What the cell needs are more general mechanisms capable of
correcting all sorts of chemical damage with a limited toolbox. This requirement is met
by the mechanisms of excision repair.
Base Excision Repair (BER) - The steps (Figure 3.38A) and some key players:
(1) Removal of the damaged base (estimated to occur some 20,000 times a day in each
cell in our body!) by a DNA glycosylase. We have at least 8 genes encoding different
DNA glycosylases each enzyme responsible for identifying and removing a specific kind
of base damage.
(2) Removal of its deoxyribose phosphate in the backbone, producing a gap. We have
two genes encoding enzymes with this function. Replacement with the correct nucleotide.
This relies on DNA polymerase beta, one of at least 11 DNA polymerases encoded by
our genes. Two enzymes are known that can do this; both require ATP to provide the
needed energy.
Nucleotide Excision Repair (NER) - NER differs from BER in several ways
(Figure 3.38B). It uses different enzymes. Even though there may be only a single "bad"
base to correct, its nucleotide is removed along with many other adjacent nucleotides;
that is, NER removes a large "patch" around the damage. The steps and some key
players: The damage is recognized by one or more protein factors that assemble at the
location.
The DNA is unwound producing a "bubble". The enzyme system that does this is
Transcription Factor IIH, TFIIH, (which also functions in normal transcription). Cuts are
made on both the 3' side and the 5' side of the damaged area so the tract containing the
damage can be removed.
A fresh burst of DNA synthesis - using the intact (opposite) strand as a template - fills in
the correct nucleotides. The DNA polymerases responsible are designated polymerase
delta and epsilon. A DNA ligase covalent binds the fresh piece into the backbone.
Xeroderma Pigmentosum (XP) - XP is a rare inherited disease of humans which,
among other things, predisposes the patient to pigmented lesions on areas of the skin
exposed to the sun and an elevated incidence of skin cancer. It turns out that XP can be
caused by mutations in any one of several genes - all of which have roles to play in NER.
Some of them:
 XPA, which encodes a protein that binds the damaged site and helps assemble the
other proteins needed for NER.
 XPB and XPD, which are part of TFIIH. Some mutations in XPB and XPD also
produce signs of premature aging.
 XPF, which cuts the backbone on the 5' side of the damage
[93]
Figure 3.38: Comparison of two major DNA repair pathways. (A) Base excision repair. This pathway
starts with a DNA glycosylase. Here the enzyme uracil DNA glycosylase removes an accidentally
deaminated cytosine in DNA. After the action of this glycosylase (or another DNA glycosylase that
recognizes a different kind of damage) the sugar phosphate with the missing base is cut out by the
sequential action of AP endonuclease and a phosphodi-esterase, the same enzymes that initiate the
repair of depurinated sites. The gap of a single nucleotide is then filled by DNA polymerase and DNA
ligase. The net result is that the U that was created by accidental deamination is restored to a C. The AP
endonuclease derives its name from the fact that it recognizes any site in the DNA helix that contains a
deoxyribose sugar with a missing base; such sites can arise either by the loss of a purine (apurinic sites)
or by the loss of a pyrimidine (apyriminic sites). (B) Nucleotide excision repair. After a multienzyme
complex recognizes a bulky lesion such as a pyrimidine dimer, one cut is made on each side of the
lesion, and an associated DNA helicase then removes the entire portion of the damaged strand. The
multi-enzyme complex in bacteria leaves the gap of 12 nucleotides shown; the gap produced in human
DNA is more than twice this size.

XPG, which cuts the backbone on the 3' side.
Transcription-Coupled NER - Nucleotide-excision repair proceeds most rapidly in
cells whose genes are being actively transcribed on the DNA strand that is serving as the
template for transcription. This enhancement of NER involves XPB, XPD, and several
other gene products. The genes for two of them are designated CSA and CSB (mutations
[94]
in them cause an inherited disorder called Cockayne's syndrome). Synthesizing
messenger RNA (mRNA), providing a molecular link between transcription and repair.
One plausible scenario: If RNA polymerase II
The CSB product associates in the nucleus with RNA polymerase II, the enzyme
responsible for, tracking along the template (antisense) strand), encounters a damaged
base, it can recruit other proteins, e.g., the CSA and CSB synthesizing messenger RNA
(mRNA), providing a molecular link between transcription and repair. One plausible
scenario: If RNA polymerase II proteins, to make a quick fix before it moves on to
complete transcription of the gene.
Mismatch
Repair
(MMR)
–
Figure 3.39: The reaction
catalyzed by DNA lig-ase. This
enzyme
seals
a
broken
phosphordiester
bond.
As
shown, DNA ligase uses a
molecule of ATP to activate the
5' end at the nick (step 1) before
forming the new bond (step 2).
In this way the energetically
unfavorable nick-sealing reaction is driven by being coupled
to the energetically favor-able
process of ATP hydrolysis. In
Bloom's syndrome, an inherited
human disease, individuals are
partially defective in DNA ligation and consequently are deficient in DNA repair; as a consequence, they have a dramatically increased incidence of
cancer.
Mismatch repair deals with correcting mismatches of the
normal bases; that is, failures to maintain normal WatsonCrick base pairing (A•T, C•G). It can enlist the aid of enzymes
involved in both base-excision repair (BER) and nucleotideexcision repair (NER) as well as using enzymes specialized for
this function.
Recognition of a mismatch requires several different proteins
including one encoded by MSH2. Cutting the mismatch out also requires several proteins,
including one encoded by MLH1. Mutations in either of these genes predisposes the
person to an inherited form of colon cancer. So these genes qualify as Tumor Suppressor
Genes. Synthesis of the repair patch is done by the same enzymes used in NER: DNA
polymerase delta and epsilon. Cells also use the MMR system to enhance the fidelity of
recombination; i.e., assure that only homologous regions of two DNA molecules pair up
to crossover and recombine segments (e.g., in meiosis).
Repairing Strand Breaks - Ionizing radiation and certain chemicals can produce
both single-strand breaks (SSBs) and double-strand breaks (DSBs) in the DNA backbone.
Single-Strand Breaks (SSBs) - Breaks in a single strand of the DNA molecule are
repaired using the same enzyme systems that are used in Base-Excision Repair (BER).
Double-Strand Breaks (DSBs) - There are two mechanisms by which the cell
attempts to repair a complete break in a DNA molecule: Direct joining of the broken
ends. This requires proteins that recognize and bind to the exposed ends and bring them
[95]
together for ligating. They would prefer to see some complementary nucleotides but can
proceed without them so this type of joining is also called Non-Homologous End-Joining
(NHEJ). Errors in direct joining may be a cause of the various translocations that are
associated with cancers.
Examples: Burkitt's lymphoma - the Philadelphia chromosome in Chronic Myelogenous
Leukemia (CML) B-cell leukemia Homologous Recombination. Here the broken ends are
repaired using the information on the intact Sister Chromatid (available in G2 after
chromosome duplication), or on the Homologous chromosome (in G1; that is, before
each chromosome has been duplicated). This requires searching around in the nucleus for
the homolog - a task sufficiently uncertain that G1 cells usually prefers to mend their
DSBs by NHEJ or on the same chromosome if there are duplicate copies of the gene on
the chromosome oriented in opposite directions (head-to-head or back-to-back). Two of
the proteins used in homologous recombination are encoded by the genes BRCA1 and
BRCA2. Inherited mutations in these genes predispose women to breast and ovarian
cancers.
Meiosis also involves DSBs - Recombination between homologous chromosomes in
meiosis I also involves the formation of DSBs and their repair. So it is not surprising that
this process uses the same enzymes. Meiosis I with the alignment of homologous
sequences provides a mechanism for repairing damaged DNA; that is, mutations. in fact,
many biologists feel that the main function of sex is to provide this mechanism for
maintaining the integrity of the genome.
However, most of the genes on the human Y chromosome have no counterpart on the X
chromosome, and thus cannot benefit from this repair mechanism. They seem to solve
this problem by having multiple copies of the same gene - oriented in opposite directions.
Looping the intervening DNA brings the duplicates together and allowing repair by
homologous recombination.
Gene Conversion - If the sequence used as a template for repairing a gene by
homologous recombination differs slightly from the gene needing repair; that is, is an
allele, the repaired gene will acquire the donor sequence. This nonreciprocal transfer of
genetic information is called gene conversion. The donor of the new gene sequence may
by: the homologous chromosome (during meiosis) the sister chromatid (also during
meiosis) a duplicate of the gene on the same chromosome (during mitosis)
Gene conversion during meiosis alters the normal mendelian ratios. Normally, meiosis in
a heterozygous (A,a) parent will produce gametes or spores in a 1:1 ratio; e.g., 50% A;
50% a. However, if gene conversion has occurred, other ratios will appear. If, for
example, an A allele donates its sequence as it repairs a damaged a allele, the repaired
gene will become A, and the ratio will be 75% A; 25% a.
3.3.6 Inherited Human Disease Related with the Gene
Many human diseases are caused by defective genes. A few common examples:
Disease
hemophilia A
Genetic defect
absence of clotting factor VIII
[96]
cystic fibrosis
defective chloride channel protein
muscular dystrophy
defective muscle protein (dystrophin)
sickle-cell disease
defective beta globin
severe combined immunoany one of several genes fail to make a protein
deficiency (SCID)
essential for T and B cell function
All of these diseases are caused by a defect at a single gene locus. (The inheritance is
recessive so both the maternal and paternal copies of the gene must be defective.)
Hemophilia A - Hemophilia A is a hereditary blood disorder, primarily affecting males,
characterized by a deficiency of the blood clotting protein known as Factor VIII that
results in abnormal bleeding. Babylonian Jews first described hemophilia more than 1700
years ago; the disease first drew widespread public attention when Queen Victoria
transmitted it to several European royal families. Mutation of the HEMA gene on the X
chromosome causes Hemophilia A. Normally, females have two X chromosomes,
whereas males have one X and one Y chromosome. Since males have only a single copy
of any gene located on the X chromosome, they cannot offset damage to that gene with
an additional copy as can females. Consequently, X-linked disorders such as Hemophilia
A are far more common in males. The HEMA gene codes for Factor VIII, which is
synthesized mainly in the liver, and is one of many factors involved in blood coagulation;
its loss alone is enough to cause Hemophilia A even if all the other coagulation factors
are still present. Treatment of Hemophilia A has progressed rapidly since the middle of
the last century when patients were infused with plasma or processed plasma products to
replace Factor VIII. HIV contamination of human blood supplies and the consequent HIV
infection of most hemophiliacs in the mid-1980s forced the development of alternate
Factor VIII sources for replacement therapy, including monoclonal antibody purified
Factor VIII and recombinant Factor VIII, both of which are used in replacement therapies
today.
Development of a gene replacement therapy for Hemophilia A has reached the clinical
trial stage, and results so far have been encouraging.
Investigators are still evaluating the long-term safety of these
therapies, and it is hoped that a genetic cure for hemophilia
will be generally available in the future.
Cystic fibrosis - Cystic fibrosis (CF) (Figure 3.40) is the
most common fatal genetic disease in the United States today.
It causes the body to produce a thick, sticky mucus that clogs
the lungs, leading to infection, and blocks the pancreas,
stopping digestive enzymes from reaching the intestines where
they are required to digest food.
CF is caused by a defective gene, which codes for a chloride
transporter found on the surface of the epithelial cells that line
the lungs and other organs. Several hundred mutations have
been found in this gene, all of which result in defective
transport of chloride, and secondarily sodium, by epithelial
cells. As a result, the amount of sodium chloride (salt) is
Figure 3.40: Building
mouse models of human
disease. Expression of a
human cystic fibrosis
(CFTR) gene in the gut
of a mouse. A human
antisense probe was
used to show human
CFTR expressed in the
mouse duodenum.
[97]
increased in bodily secretions. The severity of the disease
symptoms of CF is directly related to the characteristic
effects of the particular mutation(s) that have been
inherited by the sufferer. CF research has accelerated
sharply since the discovery of CFTR in 1989. In 1990,
scientists successfully cloned the normal gene and added it
to CF cells in the laboratory, which corrected the defective
chloride transport mechanism. This technique - gene therapy
- was then tried on a limited number of CF patients.
However, this treatment may not be as successful as
originally hoped. Further research will be required before
gene therapy, and other experimental treatments,
prove useful in combating CF.
Figure 3.41: Dystrophin and
utrophin are a similar size and
have
comparable
modular
architecture. This similarity means
that utrophin can sometimes
substitute for dystrophin, so
providing a potential route for
therapy for muscular dystrophy
sufferers. pathways.
Duchenne muscular dystrophy - Duchenne muscular dystrophy (DMD) is one of a
group of muscular dystrophies characterized by the enlargement of muscles (Figure
3.41). DMD is one of the most prevalent types of muscular dystrophy and is
characterized by rapid progression of muscle degeneration that occurs early in life. All
are X-linked and affect mainly males - an estimated 1 in 3500 boys worldwide.
The gene for DMD, found on the X chromosome, encodes a
large protein - dystrophin. Dystrophin is required inside
muscle cells for structural support; it is thought to strengthen
muscle cells by anchoring elements of the internal
cytoskeleton to the surface membrane. Without it, the cell
membrane becomes permeable, so that extracellular
components enter the cell, increasing the internal pressure
until the muscle cell "explodes" and dies. The subsequent
immune response can add to the damage. A mouse model for
DMD exists and is proving useful for furthering our
understanding on both the normal function of dystrophin and
the pathology of the disease. In particular, initial experiments
that increase the production of utrophin, a dystrophin
relative, in order to compensate for the loss of dystrophin in
the mouse are promising and may lead to the development of
Figure 3.42: (A) Haemoglobin is
effective therapies for this devastating disease.
Sickle Cell Anemia - Sickle cell anemia (Figure 3.42) is
the most common inherited blood disorder in the United
States, affecting about 72,000 Americans or 1 in 500 African
Americans.
SCA is characterized by episodes of pain, chronic hemolytic
anemia and severe infections, usually beginning in early
childhood. SCA is an autosomal recessive disease caused by
a point mutation in the hemoglobin beta gene (HBB) found
on chromosome 11p15.4. Carrier frequency of HBB varies
significantly around the world, with high rates associated
made up of 4 chain 2 α and 2 β.
In SCA a point mutation causes
the amino acid glutamine (Gln)
to be replaced by valine (Val) in
the β chains of HbA, resulting in
the abnormal HbS. (B) Under
certain conditions, such as low
oxygen levels. RBCs with Hbs
distort into sickle shapes. (C)
These sickled cells can block
small
vessels
producing
microvascular occlusions which
may causes necrosis (death) of
the tissue.
[98]
with zones of high malaria incidence, since carriers are somewhat protected against
malaria. About 8% of the African American populations are
carriers. A mutation in HBB results in the production of a
structurally abnormal hemoglobin (Hb), called HbS. Hb is an
oxygen carrying protein that gives red blood cells (RBC) their
characteristic color. Under certain conditions, like low oxygen
levels or high hemoglobin concentrations, in individuals who are
homozygous for HbS, the abnormal HbS clusters together,
distorting the RBCs into sickled shapes. These deformed and
rigid RBCs become trapped within small blood vessels and block
them, producing pain and eventually damaging organs. Though,
as yet, there is no cure for SCA, a combination of fluids,
painkillers, antibiotics and transfusions are used to treat Figure 3.43: Gene Therapy
has been attempted to treat
symptoms and complications.
Hydroxyurea, an antitumor drug, has been shown to be effective severe combined immunodeficiency caused by a
in preventing painful crises. Hydroxyurea induces the formation missing enzyme, adenosine
of fetal Hb (HbF) - a Hb normally found in the fetus or deaminase.
newborn— which, when present in individuals with SCA,
prevents sickling. A mouse model of SCA has been developed and is being used to
evaluate the effectiveness of potential new therapies for SCA.
Severe combined immunodeficiency - Severe combined
immunodeficiency (SCID) represents a group of rare, sometimes
fatal, congenital disorders characterized by little or no immune
response (Figure 3.43). The defining feature of SCID, commonly
known as "bubble boy" disease, is a defect in the specialized
white blood cells (B- and T-lymphocytes) that defend us from
infection by viruses, bacteria and fungi. Without a functional
immune system, SCID patients are susceptible to recurrent
infections such as pneumonia, meningitis and chicken pox, and
can die before the first year of life. Though invasive, new Figure
3.44:
Adult
hemoglobin
(HbA)
contains
treatments such as bone marrow and stem-cell transplantation
save as many as 80% of SCID patients. All forms of SCID are two alpha chains and two
beta chains. In thalassemia,
inherited, with as many as half of SCID cases linked to the X there is deficient synthesis of
chromosome, passed on by the mother. X-linked SCID results either the alpha chains or
from a mutation in the interleukin 2 receptor gamma (IL2RG) the beta chains. Symptoms
gene which produces the common gamma chain subunit, a are a result of not only low
component of several IL receptors. IL2RG activates an important levels of HbA, but also the
relatively high levels of the
signalling molecule, JAK3. A mutation in JAK3, located on chain that is synthesized.
chromosome 19, can also result in SCID. Defective IL receptors
and IL receptor pathways prevent the proper development of T-lymphocytes that play a
key role in identifying invading agents as well as activating and regulating other cells of
the immune system.
In another form of SCID, there is a lack of the enzyme adenosine deaminase (ADA),
coded for by a gene on chromosome 20. This means that the substrates for this enzyme
accumulate in cells.
[99]
Immature lymphoid cells of the immune system are particularly sensitive to the toxic
effects of these unused substrates, so fail to reach maturity. As a result, the immune
system of the afflicted individual is severely compromised or completely lacking.
Some of the most promising developments in the search for new
therapies for SCID center on 'SCID mice', which can be bred
deficient in various genes including ADA, JAK3, and IL2RG. It
is now possible to reconstitute the impaired mouse immune
system by using human components, so these animals provide a
very useful model for studying both normal and pathological
immune systems in biomedical research.
severe combined immunodeficiency (SCID) - SCID is a
disease in which the patient has neither cell-mediated immune
Figure 3.45: Mitochondrial
responses nor is able to make antibodies.
It is a disease of young children because, until recently, the localization of human frataxin
absence of an immune system left them prey to infections that in live mammalian cells.
ultimately killed them.
About 25% of the cases of SCID are the result of the child being
homozygous for defective genes encoding the enzyme adenosine
deaminase (ADA). The normal catabolism of purines is deficient,
and this is particularly toxic for T cells and B cells.
Figure 3.46: The
Thalassemia - Thalassemia is an inherited disease of faulty
human
HPRT1
gene contains 9
exons.
A
wide
variety of HPRT1
mutations
can
cause LNS, these
include
deletions,
insertions, singlebase substitutions
and
frame-shift
mutations.
synthesis of hemoglobin (Figure 3.44). The name is derived from the
Greek word "thalassa" meaning "the sea" because the condition was
first described in populations living near the Mediterranean Sea;
however, the disease is also prevalent in Africa, the Middle East, and
Asia. Thalassemia consists of a group of disorders that may range
from a barely detectable abnormality of blood, to severe or fatal
anemia. Adult hemoglobin is composed of two alpha (α) and two
beta (β) polypeptide chains. There are two copies of the hemoglobin
alpha gene (HBA1 and HBA2), which each encode an α -chain, and
both genes are located on chromosome 16. The hemoglobin beta
gene (HBB) encodes the β-chain and is located on chromosome 11.
In α-thalassemia, there is deficient synthesis of α-chains. The
resulting excess of β-chains bind oxygen poorly, leading to a low
concentration of oxygen in tissues (hypoxemia). Similarly, in βthalassemia there is a lack of β-chains. However, the excess αchains can form insoluble aggregates inside red blood cells. These
aggregates cause the death of red blood cells and their precursors,
causing a very severe anemia. The spleen becomes enlarged as it removes damaged red
blood cells from the circulation. Deletions of HBA1 and/or HBA2 tend to underlie most
cases of α-thalassemia. The severity of symptoms depends on how many of these genes
are lost. Loss of one or two genes is usually asymptomatic, whereas deletion of all four
genes is fatal to the unborn child.
[100]
In contrast, over 100 types of mutations affect HBB, and deletion mutations are rare.
Splice mutations and mutations that occur in the HBB gene promoter region tend to cause
a reduction, rather than a complete absence, of β-globin chains and so result in milder
disease. Nonsense mutations and frameshift mutations tend to not produce any β-globin
chains leading to severe disease.
Currently, severe thalassemia is treated by blood transfusions, and a minority of patients
are cured by bone marrow transplantation. Mouse models are proving to be useful in
assessing the potential of gene therapy.
Friedreich's ataxia - Friedreich's ataxia (FRDA) is a rare inherited disease
characterized by the progressive loss of voluntary muscular coordination (ataxia) and
heart enlargement. It is named after the German doctor, Nikolaus Friedreich, who first
described the disease in 1863. FRDA is generally diagnosed in childhood and affects both
males and females. FRDA is an autosomal recessive disease caused by a mutation of a
gene called frataxin, which is located on chromosome 9 (Figure 3.45). This mutation
means that there are many extra copies of a DNA segment, the trinucleotide GAA. A
normal individual has 8 to 30 copies of this trinucleotide, while FRDA patients have as
many as 1000. The larger the number of GAA copies, the earlier the onset of the disease
and the quicker the decline of the patient. Although we know that frataxin is found in the
mitochondria of humans, we do not yet know its function. However, there is a very
similar protein in yeast, YFH1, which we know more about. YFH1 is involved in
controlling iron levels and respiratory function. Since frataxin and YFH1 are so similar,
studying YFH1 may help us understand the role of frataxin in FRDA.
Lesch-Nyhan syndrome - Lesch-Nyhan syndrome (LNS) is a rare inherited disease
that disrupts the metabolism of the raw material of genes. These raw materials are called
purines, and they are an essential part of DNA and RNA. The body can either make
purines (de novo synthesis) or recycle them (the resalvage pathway). Many enzymes are
involved in these pathways. When one of these enzymes is missing, a wide range of
problems can occur.
In LNS, there is a mutation in the HPRT1 gene located on the X chromosome. The
product of the normal gene is the enzyme hypoxanthine-guanine phosphoribosyltransferase, which speeds up the recycling of purines from broken down DNA and RNA. Many
different types of mutations affect this gene, and the result is a very low level of the
enzyme. The mutation is inherited in an X-linked fashion. Females who inherit one copy
of the mutation are not affected because they have two copies of the X chromosome
(XX). Males are severely affected because they only have one X chromosome (XY), and
therefore their only copy of the HPRT1 gene is mutated (Figure 3.46). Mutations of the
HPRT1 gene cause three main problems. First is the accumulation of uric acid that
normally would have been recycled into purines. Excess uric acid forms painful deposits
in the skin (gout) and in the kidney and bladder (urate stones). The second problem is
self-mutilation. Affected individuals have to be restrained from biting their fingers and
tongues. Finally, there is mental retardation and severe muscle weakness.
In the year 2000 it was shown that the genetic deficiency in LNS could be corrected in
vitro. A virus was used to insert a normal copy of the HPRT1 gene into deficient human
[101]
cells. Such techniques used in gene therapy may one day provide a cure for this disease.
For now, medications are used to decrease the levels of uric acid.
3.3.7 Cancer
A cancer is an uncontrolled proliferation of cells. In some the rate is fast; in
others, slow; but in all cancers the cells never stop dividing. This distinguishes
cancers - malign tumors or malignancies (Figure 3.47) - from benign growths
like moles where their cells eventually stop dividing. Cancers are clones. No
matter how many trillions of cells are present in the cancer, they are all descended
from a single ancestral cell. Evidence: Although normal tissues of a woman are a
mosaic of cells in which one X chromosome or the other has been inactivated, all
her tumor cells - even if from multiple sites - have the same X chromosome
inactivated.
Cancers begin as a primary tumor. At some point, however, cells break Figure 3.47: Development of
away from the primary tumor and - traveling in blood and lymph - Tumorus tissue.
establish metastases in other locations of the body. Metastasis is what
usually kills the patient. Cancer cells are usually less differentiated than the normal cells
of the tissue where they arose. Many people feel that this reflects a process of
dedifferentiation, but I doubt it. Rather, evidence is accumulating that cancers arise in
precursor cells - stem cells or "progenitor cells" - of the tissue: cells that are dividing by
mitosis producing daughter cells that are not yet fully differentiated
3.3.8 Initiation of Cancer at Cellular Level/Progression to Cancer
Figure 3.48: The activities and cellular locations of the products of the main classes of known protooncogenes. Some representative proto-oncogenes in each class are indicated in brackets.
[102]
Figure 3.49: Three ways in which a proto-oncogene can be converted into an oncogene. A fourth
mechanism (not shown) involves recombination between retroviral DNA and a proto-oncogene. This has
effects similar to those of chromosome rearrangement, bringing the proto-oncogene under the control of
a viral enhancer and/or fusing it to a viral gene that is actively transcribed.
Figure 3.50: The translocation between
chromosomes 9 and 22 responsible for
chronic myelogenous leukemia. The
smaller of the two resulting abnormal
chromosomes is called the Philadelphia
chromosome, after the city where the
abnormality was first recorded.
What probably happens is: A
single cell - perhaps an adult
stem cell or progenitor cell in a
tissue suffers a mutation (red
line) in a gene involved in the
cell cycle, e.g., an oncogene or
tumor suppressor gene. This
results in giving that cell a
slight growth advantage over
other dividing cells in the
tissue. As that cell develops into a
Figure 3.51: Evidence
from
X-inactivation
mosaics demonstrates
the monoclonal origin of
cancers. As a result of a
random process that
occurs in the early
embryo, practically every
normal tissue in a
woman's body is a
mixture of cells with
different
X
chromosomes heritably
inactivated
(indicated
here by the mixture of
red cells and gray cells
in the normal tissue).
When the cells of a
cancer are tested for
their expression of an Xlinked marker gene,
however,
they
are
usually all found to have
the
same
X
chromosome
inactivated. This implies
that they are all derived
from a single cancerous
founder cell.
[103]
clone, some if its descendants suffer another mutation (red line) in another cell-cycle
gene. This further deregulates the cell cycle of that cell and its descendants. As the rate of
mitosis in that clone increases, the chances of further DNA damage increases. Eventually,
so many mutations have occurred that the growth of that clone becomes completely
unregulated. The result: full-blown cancer. Stem cells are cells that divide by mitosis to
form either two stem cells, thus increasing the size of the stem cell "pool", or one
daughter that goes on to differentiate, and one daughter that retains its stem-cell
properties. There is growing evidence that most of the cells in leukemias, breast, brain,
Figure 3.52: How replication of damaged DNA can lead to chromosome abnormalities and gene
amplification. The diagram shows one of several possible mechanisms. The process begins with
accidental DNA damage in a cell that lacks functional p53 protein. Instead of halting at the p53- dependent
checkpoint in the G1 phase of the division cycle, where a normal cell with damaged DNA would halt until
the damage was repaired, the p53-defective cell enters S phase, with the conseque-nces shown. Once a
chromosome carrying duplication and lacking a telomere has been generated, repeated rounds of
replication, chromatid fusion, and unequal breakage can increase the number of copies of the duplicated
region still further. Selection in favor of cells with increased numbers of copies of a gene in the affected
chromosomal region will thus lead to mutants in which the gene is amplified to a high copy number. The
multiple copies may eventually become visible as a homo-geneously staining region in the chromosome,
or they may – either through a recombination event or through unrepaired DNA strand breakage - become
excised from their original locus and so appear as independent double minute chromosomes .
and colon cancers are not able to proliferate out-of-control (and to metastasize). Only
those members of the clone that retain their stem-cell-like properties (~2.5% of the cells
in a tumor of the colon) can do so.
There is certain logic to this. Most terminally-differentiated cells have limited potential to
divide by mitosis and, seldom passing through S phase of the cell cycle, are limited in
their ability to accumulate the new mutations that predispose to becoming cancerous.
Furthermore, they often have short life spans - being eliminated by apoptosis (e.g.,
lymphocytes) or being shed from the tissue (e.g., epithelial cells of the colon). The adult
stem cell pool, in contrast, is long-lived, and its members have many opportunities to
[104]
acquire new mutations as they produce differentiating daughters as well as daughters that
maintain the stem cell pool.
Different types of genetic accident that can convert a proto-oncogene into an oncogene
are summarized in Figure 3.49. The gene may be altered by a point mutation, by a
deletion, through a chromosomal translocation, or by insertion of a mobile genetic
element such as retroviral DNA.
3.3.9 Proto-oncogenes & Oncogenes
Proto-oncogenes
A Retrovirus Can Transform a Host Cell by Inserting Its DNA Next to a
Proto-oncogene of the Host - There are two ways in which a proto-oncogene can
be converted into an oncogene upon incorporation into a retrovirus: the gene sequence
may be altered or truncated so that it codes for a protein with abnormal activity, or the
gene may be brought under the control of powerful promoters and enhancers in the viral
genome that cause its product to be made in excess or in inappropriate circumstances.
Retroviruses can also exert similar oncogenic effects in a different way: DNA copies of
the viral RNA may simply be inserted into the host cell genome at sites close to, or even
within, proto-oncogenes. The resulting genetic disruption is called an insertional
mutation, and the altered genome is inherited by all the progeny of the original host cell.
More or less random insertion of DNA copies of the viral RNA into the host DNA occurs
as part of the normal retroviral life cycle, and in at least one well-documented case,
insertion anywhere within about 10,000 nucleotide pairs from a proto-oncogene can
cause abnormal activation of that gene. Insertional mutagenesis provides an important
means of identifying proto-oncogenes, which can be tracked down by their proximity to
the inserted viral DNA. Proto-oncogenes identified in this way often turn out to be the
same as those discovered in the other way, as counterparts to oncogenes that retroviruses
carry from cell to cell, but some new ones have been discovered as well (Table 3.1). An
example is the Wnt-1 gene, activated by insertional mutagenesis in breast cancers in mice
infected with the mouse mammary tumor virus (Figure 3.48). This gene turns out to be
closely homologous to the Drosophila gene wingless, which is involved in cell-cell
communications that regulate details of the body pattern of the fly.
Different Searches for the Genetic Basis of Cancer Converge on
Disturbances in the Same Proto-oncogenes - While some researchers pursued
the line of investigation leading from retroviruses to oncogenes, others took a more direct
approach and searched for DNA sequences in human cancer cells that would provoke
uncontrolled proliferation when introduced into non cancerous cells. The assay was again
done in cell culture, using an established line of mouse-derived fibroblast cells - NIH 3T3
cells - as the noncancerous hosts and transfecting them with DNA taken from human
tumor cells.
The findings were dramatic. Oncogenes were detected in many lines of human cancer
cells, and in several cases these oncogenes turned out to be mutant alleles of some of the
same proto-oncogenes that had been identified by the retroviral approach or of genes very
[105]
closely related to them. About one in four human tumors, for example, was found to
contain a mutated member of the ras gene family, first discovered as oncogenes carried
by retroviruses that cause sarcomas in rats. Thus two independent lines of inquiry
converged on the same genes. Yet another approach that led to some of the same protooncogenes was based on the karyotyping of tumor cells. As mentioned earlier, in almost
all patients with chronic myelogenous leukemia, the leukemic cells show the same
chromosomal translocation, between chromosomes 9 and 22; likewise, in Burkitt's
lymphoma there is regularly a translocation between chromosome 8 and one of the three
chromosomes containing the genes that encode antibody molecules. In both these types
of cancer the translocation breakpoint, where part of one chromosome is joined to
another, was found to coincide exactly with the location of a proto-oncogene already
known from retroviral studies - abl in chronic myelogenous leukemia, myc in Burkitt's
lymphoma. Analogous chromosome translocations are similarly associated with some
other types of cancer. From DNA sequencing studies it seems that in some cases the
translocation turns a proto-oncogene into an oncogene by fusing the proto-oncogene to
another gene in such a way that an altered protein is produced (Figure 3.49); in other
cases the translocation moves a proto-oncogene into an inappropriate chromosomal
environment that activates its transcription so that the normal protein is produced in
excess.
A Proto-oncogene Can Be Made Oncogenic in Many Ways - So far, about 60
proto-oncogenes have been discovered (Tables 3.1 and 3.2 show a small selection); each
of these can be converted into an oncogene that plays a dominant part in cancers of one
sort or another. Most such genes have been encountered repeatedly, in a variety of mutant
forms and in several kinds of cancer, suggesting that the majority of mammalian protooncogenes may already have been identified.
But what functions do these genes have in a normal healthy cell, that mutations in them
should be so dangerous? Most proto-oncogenes code for components of the mechanisms
that regulate the social behavior of cells in the body - in particular, the mechanisms by
which signals from a cell's neighbors can impel it to divide, differentiate, or die. In fact,
many of the components of cell signaling pathways were first identified through searches
for oncogenes, and a full list of proto-oncogene products includes examples of practically
every type of molecule involved in cell signaling - secreted proteins, transmembrane
receptors, GTP-binding proteins, protein kinases, gene regulatory proteins, and so on, as
summarized in Figure 24-26. All these molecules normally serve in complex relay chains
to deliver signals for the production of more cells when more cells are needed. But
mutations can alter them so that they deliver the signals even when more cells are not
needed. The proto-oncogene erbB, for example, codes for the receptor for epidermal
growth factor (EGF); when EGF binds to the receptor's extra-cellular domain, the
intracellular domain generates a stimulatory signal inside the cell. A mutation in c-erbB
can turn it into an oncogene by deleting the extracellular EGF-binding domain in such a
way that the intracellular stimulatory signal is produced constantly, even if no EGF is
present. In a similar way a point mutation at an appropriate site in a ras gene can create a
Ras protein that fails to hydrolyze its bound GTP and so persists abnormally in its active
state, transmitting an intracellular signal for cell proliferation even when it should not.
Innumerable other examples can be given.
[106]
The basic types of genetic accident that can convert a proto-oncogene into an oncogene
are summarized in Figure 3.51. The gene may be altered by a point mutation, by a
deletion, through a chromosomal translocation, or by insertion of a mobile genetic
element such as retroviral DNA. The change can occur in the protein-coding region so as
to yield a hyperactive product, or it can occur in adjacent control regions so that the gene
is simply over expressed. Alternatively, the gene may be over expressed because it has
been amplified to a high copy number through errors in the process of chromosome
replication.
Specific types of abnormality are characteristic of particular genes and of the responses to
particular carcinogens. For example, 90% of the skin tumors evoked in mice by the tumor
initiator dimethylbenz[a]anthracene (DMBA) have an A-to-T alteration at exactly the
same site in a mutant ras gene; presume-ably, of the mutations caused by DMBA, it is
only the ones at this site that efficiently activate skin cells to form a tumor. Members of
the Myc gene family, on the other hand, are frequently over expressed or amplified. The
Myc protein normally acts in the nucleus as a signal for cell proliferation, as excessive
quantities of Myc cause the cell to embark on the cell-division cycle in circumstances
where a normal cell would halt.
Table 3.1. Some Oncogenes Originally Identified Through Their Presence in
Transforming Retroviruses
[107]
Table 3.2. Some Oncogenes Originally Identified by Means Other Than Their Presence
inTransforming Retroviruses
S.N.
1
2
3
4
Means of Detection Oncogenes
Insertional mutation Wnt-1 (int-1), fgf-3 (int-2), Notch-1 (int-3), lck
Amplification L- myc, N- myc
Transfection neu, N- ras, trk, ret
Translocation bcl-2, RARa
3.3.9 Oncogenes
An oncogene is a gene that when mutated or expressed at abnormally-high levels
contributes to converting a normal cell into a cancer cell. Cancer cells are cells that are
engaged in uncontrolled mitosis. The signals for normal mitosis Normal cells growing in
culture will not divide unless they are stimulated by one or more growth factors present
in the culture medium. Example: PDGF (platelet-derived growth factor), which is
encoded by the gene PDGFB (also known as SIS).
The molecules of growth factor bind to molecules of its receptor, an integral membrane
protein embedded in the plasma membrane with its ligand-binding site exposed at the
surface of the cell. Example: the protein encoded by the gene ERBB2 encodes a receptor
for epidermal growth factor (EGF). (In humans, ERBB2 is also known as HER2.)
Binding of a growth factor to its receptor triggers a cascade of signaling events within the
cytosol. Many of these involve kinases — enzymes that attach phosphate groups to other
proteins. Examples: the proteins encoded by SRC, RAF, ABL, and the fusion protein
encoded by BCR/ABL found in chronic myelogenous leukemia (CML).
Or molecules that turn on kinases. Example: RAS. RAS molecules reside on the inner
surface of the plasma membrane where they serve to link receptor activation to
"downstream" kinases like RAF.
In most cases, phosphorylation activates the protein and eventually transfers the signal
into the nucleus. Here phosphorylation activates transcription factors that bind to
promoters and enhancers in DNA, turning on their associated genes. Examples: AP-1, a
heterodimer of the proteins encoded by jun and fos.
Some of the genes turned on by these transcription factors encode other transcription
factors. Example: myc.
Some of the genes turned on by these downstream transcription factors encode cyclins
that prepare the cell to undergo mitosis.
Genes that participate in any one of the steps above can become oncogenes if:
 they become mutated so that their product becomes constitutively active (that is,
active all the time even in the absence of a positive signal) or
 they produce their product in excess. Possible causes:
Their promoter and/or enhancer have become mutated. Example: the oncomouse: a
transgenic mouse that has both copies of its myc gene under the influence of extrapowerful promoters.
Loss (e.g., by a translocation) of the 3' UTR of their mRNA so that a microRNA
(miRNA) that normally represses translation can no longer do so.
[108]
All these oncogenes act as dominants; if the cell has one normal gene (sometimes called a
proto-oncogene) at a locus and one mutated gene (the oncogene), the abnormal product
takes control.
No single oncogene can, by itself, cause cancer. It can, however, increase the rate of
mitosis of the cell in which it finds itself. Dividing cells are at increased risk of acquiring
mutations, so a clone of actively dividing cells can yield subclones of cells with a second,
third, etc. oncogene. When a clone loses all control over its mitosis it is well on its way to
developing into a cancer.
Other types of potential cancerpromoting genes - Genes that
inhibit apoptosis. The suicide of
damaged cells (apoptosis) provides an
important mechanism for ridding the
body of cells that could go on to form
a cancer.
It is not surprising then that inhibiting
apoptosis can promote the formation
of a cancer.
Example: Bcl-2. The product of this
gene
inhibits
apoptosis.
Over
expression of the gene is a hallmark of
B-cell cancers.
Genes involved in repairing
DNA or stopping mitosis if they
fail - Mutations arise from an
unrepaired error in DNA. So any gene
whose product participates in DNA
repair probably can also behave as an
oncogene when mutated.
Example: ATM. ATM (="ataxia
telangiectasia mutated") gets its name
from a human disease of that name,
whose patients - among other things are at increased risk of cancer. The
ATM protein is also involved in
detecting DNA damage and interrupting the cell cycle when damage is found.
It is estimated that fully 1% of the 25,000 or so genes in the human genome are protooncogenes.
This graph (based on the work of E. Sinn et
al, Cell 49:465,1987) shows the synergistic
effect of two oncogenes. The fraction (%) of
transgenic mice without tumors is shown as a
function of age. Three groups are shown:
those mice transgenic for a hyperactive myc
alone (blue) those transgenic for ras alone
(green) those transgenic for both myc and ras
(red)
Tumor-Suppressor Genes
The products of some genes inhibit mitosis. These genes are called tumor suppressor
genes.
[109]
3.4 LET US SUM UP
Genetic recombination mechanisms allow large sections of DNA double helix to
move from one chromosome to another.
In general recombination the initial reactions rely on extensive base-pairing
interactions between strands of the two DNA double helices that will recombine.
It does not normally change the arrangement of the genes in a chromosome.
Site-specific recombination, on the other hand, alters the relative positions of
nucleotide sequences in chromosomes because the pairing reactions depend on a
protein mediated recognition of the two DNA sequences that will recombine, and
extensive sequence homology is not required.
Two site-specific recombination mechanisms are common: (1) conservative sitespecific recombination, which produces a very short heteroduplex and therefore
requires some DNA sequence that is the same on the two DNA molecules, and (2)
transpositional site-specific recombination, which produces no heteroduplex and
usually does not require a specific sequence on the target DNA.
A protein called Ku is essential for NHEJ. Ku is a heterodimer of the subunits
Ku70 and Ku80. In the 9 August 2001 issue of Nature, Walker, J. R., et al, report
the three-dimensional structure of Ku attached to DNA. Their structure shows
beautifully how the protein aligns the broken ends of DNA for rejoining.
Some of the same enzymes used to repair DSBs by direct joining are also used to
break and reassemble the gene segments used to make antibody variable regions;
that is, to accomplish V(D)J joining - (mice whose Ku80 genes have been
knocked out cannot do this); different antibody classes; that is, to accomplish
class switching.
Concept-A question arise that how does the MMR system know which is the
incorrect nucleotide? In E. coli, certain adenines become methylated shortly after
the new strand of DNA has been synthesized. The MMR system works more
rapidly, and if it detects a mismatch, it assumes that the nucleotide on the alreadymethylated (parental) strand is the correct one and removes the nucleotide on the
freshly-synthesized daughter strand. How such recognition occurs in mammals is
not yet known.
Some mutations arise from spontaneous alterations to DNA structure, such as
depurination and deamination, which may alter the pairing properties of the bases
and cause errors in subsequent rounds of replication.
Ionizing radiation such as X-rays and gamma rays damage DNA by dislodging
electrons from atoms; these electrons then break phosphodiester bonds and alter
the structure of bases. Ultraviolet light causes mutations primarily by producing
pyrimidine dimers that disrupt replication and transcription. The SOS system
enables bacteria to overcome replication blocks but introduces mistakes in
replication.
Chemicals can produce mutations by a number of mechanisms. Base analogs are
inserted into DNA and frequently pair with the wrong base. Alkylating agents,
deaminating chemicals, hydroxylamine, and oxidative radicals change the
structure of DNA bases, thereby altering their pairing properties. Intercalating
agents wedge between the bases and cause single-base insertions and deletions in
replication.
[110]
5.5 CHECK YOUR PROGRESS
NOTE: (1) Write your answer in the space given below.
(2) Compare your answer with the one given at the end of this unit.
‫( א‬1) Fill in the blanks :
(a) Mendel's experiments with mixing one trait always resulted in a …….. ratio
between dominant and recessive phenotypes, his experiments with mixing two
traits (dihybrid cross) showed ……………. ratios.
(b) E. coli ………. protein also has a major role in homologous recombination.
(c) When the virus enters a cell, a virus-encoded enzyme called …………………
is synthesized.
(d) RecBCD, also known as ………………………………...
(e) The first discovery of a chemical mutagen was made by ………………….
‫( א‬2) Write the answer of following questions :
(a) How RecA and RecBCD are important for recombination?
(b) Explain Genetic Mapping and Physical Mapping?
[111]
(c) Transposons are associated with mutation? Explain this fact
(d) Write name of disease which are developed due to disorder in function of gene
and explain the related features.
5.6 CHECK YOUR PROGRESS : THE KEY
‫( א‬1) (a) Chromatin 3:1, 9:3:3:1
(c) Exonuclease V
‫( א‬2) (a) see section 3.2.3
(a) see section 3.3.1
(b) RecA
(d) Charlotte Auerbach
(b) see section 3.2.6
(b) see section 3.3.6
3.7 ASSIGNMENT
Make a project illustrating process of recombination including Holliday
Junction Model.
[112]
3.8 REFERENCES
Our courteous thanks to following two authors/publishers for preparing the
various section of this chapter:B. Alberts et al., ‘Molecular Biology of the Cell’: 4th Ed. (2002). Garland.
Benzamin A. Pierce, ‘Genetics : A Coneptual Approach’
Other helping resources are as follows:Joo C, McKinney SA, Nakamura M, Rasnik I, Myong S and Ha T. (2007). Realtime Observation of RecA Filament Dynamics with Single Monomer Resolution.
Cell 126:515-527.
Singleton MR, Dillingham MS, Gaudier M, Kowalczykowski SC, Wigley DB,
"Crystal structure of RecBCD enzyme reveals a machine for processing DNA
breaks", Nature. (2004) Nov 11; 432 (7014): 187-93.
Lewin. Genes VII. (2000). Oxford University Press.
C. R. Calladine and H. R. Drew. Understanding DNA: The Molecule and How It
Works. 2nd edn (1997). Academic Press. (3rd edn due in 2004).
Jeremy Dale and Simon F. Park. Molecular Genetics of Bacteria, 4th Edition
2004. John Wiley & Sons, Ltd
H. Lodish et al. Molecular Cell Biology, 4th edn (1995). W. H. Freeman. (5th edn
due in 2003–2004).
L. Snyder and W. Champness (2003). Molecular Genetics of Bacteria, 2nd edn.
American Society for Microbiology.
S. Baumberg (ed.) (1999). Prokaryotic Gene Expression.
M. T. Madigan, J. M. Martinko and J. Parker (2000). Biology of Microorganisms
(better known as ‘Brock’), 9th edn. Prentice Hall International.
J. W. Dale and M. von Schantz (2002). From Genes to Genomes. John Wiley &
Sons.
T. A. Brown (2001). Gene Cloning – An Introduction, 4th edn. Blackwell
Science.
S. B. Primrose, R. Twyman and R. W. Old (2001). Principles of Gene
Manipulation, 6th edn. Blackwell Science.
B. R. Glick (2003). Molecular Biotechnology: Principles and Applications of
Recombinant DNA, 3rd edn. American Society for Microbiology.
D. P. Snustad and M. J. Simmons (2000). Principles of Genetics, 2nd edn. John
Wiley.
W. S. Klug and M. R. Cummings (2000). Concepts of Genetics, 6th edn. Prentice
Hall.
L. H. Hartwell and others (2000). Genetics. McGraw-Hill.
P. J. Russell (2002). Genetics. Benjamin Cummings.
A. J. F. Griffiths, W. M. Gelbart, R. C. Lewontin and J. H. Miller (2002). Modern
Genetic Analysis, 2nd edn. W. H. Freeman.
R. W. Hendrix et al (1983). Lambda II.
M. Wilson, R. McNab and B. Henderson (2002). Bacterial Disease Mechanisms.
Cambridge University Press.
[113]
W. Hayes (1968). The Genetics of Bacteria and their Viruses, 2nd edn. Blackwell
Scientific Publications.
Websites which give more information on chromosomal activities,
genomic data bases and mutation are as follows:http://www.sanger.ac.uk/
http://www.tigr.org/
http://www.ncbi.nlm.nih.gov/
******
[114]
Download