The basic molecular themes of life

advertisement
Daph-01.qxd
29/10/04
9:02 PM
Page 3
Chapter
1
The basic molecular
themes of life
This chapter aims at conveying an appreciation of the consistency of the way in which all life is based on a number of basic
molecular themes, the tremendous diversity of life forms being
variations on them. The molecular nature of life, with the
seemingly never-ending succession of discoveries of almost
incredible chemical mechanisms, is exciting. The subject is of
tremendous and ever-increasing importance in medicine, agriculture, and all aspects of biology.
Biochemistry and molecular biology are the scientific
disciplines that aim to understand life in molecular terms.
Biochemistry is the name for the earliest-studied aspects of the
subject in which the metabolism of food and small molecules
was a principle focus. Molecular biology was the name given
later to the study of biological macromolecules, particularly
proteins and DNA, and the genetic mechanism. The distinction
between biochemistry and molecular biology is blurred but the
terms are convenient, if imprecise, labels. Many biochemistry
departments in universities and biochemical societies have
added ‘Molecular Biology’ to their titles.
Unity of life at the molecular level
Given the diversity of life forms, it might be thought that biochemistry and molecular biology must be a diffuse subject, but
life at the molecular level is remarkably similar (biochemists
and molecular biologists are dying to know if this will apply to
life on Mars or wherever it might be discovered). A famous
dictum of the French Nobel Prize winner, Jacques Monod, is
that ‘what holds for the Coli bacterium is true for an elephant’,
meaning that the similarities between a bacterium living in the
human gut and an elephant exceed the differences when viewed
at the molecular level.
Life is currently presumed to have had a single origin and
once the primordial form of a self-replicating living cell was
developed, many of the fundamental biochemical processes had
already been established and life was locked into these. As
diversity of life forms evolved, the chemistry of cells changed to
cope with new needs, but the underlying basis of life remained
much the same.
This explains why, in biochemical research, a variety of
organisms is often used to elucidate a given biochemical
process. To understand how a process works, for example in
humans, the best strategy may be to study the bacterium,
Escherichia coli, or a virus, where the basic information might be
more easily obtained. E. coli is the most intensively studied cell.
Relative simplicity and rapid growth make it a favourite for
studies on the genetic mechanism. There are differences in
molecular processes between bacterial and human cells but they
are more matters of detail rather than of principle. Biochemical
knowledge is applicable to all life forms.
Living cells obey the laws of physics
and chemistry: the energy cycle in life
To grow and reproduce, cells take in simple molecules from the
external medium and build them up into organized complex
molecules. The synthesis of complex molecules from simpler
ones involves an increase in energy, so chemical work must be
done. A living cell is at a higher energy level than the random
collection of molecules in its external environment from which
Daph-01.qxd
29/10/04
9:02 PM
Page 4
4 chapter 1: the basic molecular themes of life
it is produced. It is far from being in thermodynamic (energetic)
equilibrium with its surroundings; this is achieved only by
decomposition after the death of the cell.
The first law of thermodynamics states that energy can be
neither created nor destroyed—the total energy content of the
universe remains constant. However, energy can be transformed from one state to another. Familiar examples are kinetic
energy converted into heat and heat into electricity as in a
power station. The energy needed to drive all aspects of living
systems is derived from chemical energy in the form of food or,
in the case of photosynthetic organisms, from sunlight, which is
the ultimate energy source for all food. (A minor exception to
this statement are organisms living on chemical compounds
from the Earth’s crust, such as extremophiles living on H2S near
hydrothermal vents in the ocean floor.) Food is taken in by
organisms where it is oxidized back to CO2 and water and the
energy so released is used to drive all the reactions of a living
cell. This is summarized in the energy cycle of life shown in Fig. 1.1.
The second law of thermodynamics states that all processes
increase the total entropy of the universe, the ultimate end
seemingly being an inert, dark, cold universe of infinite entropy
in which everything is uniformly scattered. Entropy is the
degree of randomness (disorder) or, as Willard Gibbs put it in
more homely terms, the degree of mixed-upness. A low-entropy
system is at a higher energy level than a similar system of high
entropy. A living cell takes in simple molecules from the environment (high entropy) and converts them into the organized
structure of the cell (low entropy). This might look as if the cell
flouts the second law, the explanation being that in releasing the
required energy from the breakdown of food, entropy is
increased by heat and CO2 formation more than it is reduced by
Food
molecules
Energy
scale
O2
Large
organized
molecules
of the cell
Chemical
energy
Light
energy
CO2+H 2O
Photosynthesis Catabolism
Small
disorganized
molecules
from the
environment
Anabolism
Fig. 1.1 The energy cycle in life. Catabolism is the breakdown of
complex molecules releasing energy in the cell. Anabolism is the
energy-requiring transformation of simple molecules into more
complex ones.
assembly of the cell. The result is that in the complete system
(cell environment), the entropy increases and the second
law is obeyed. The fraction of the total energy released by a reaction capable of performing work in the cell is known as ‘free
energy’—free meaning available for work, not free as in something for nothing.
ATP (adenosine triphosphate) is the universal energy
currency in life
In what form is energy derived from the oxidation of foodstuffs
used by the cell? The sugar glucose will burn if thrown onto a
fire and energy will be released in the form of heat. Cells burn
glucose in the sense that it is oxidized to CO2 and water, but in
life processes the free energy released by the oxidation of food
must be harnessed to be usable. (Some heat is liberated during
the oxidative metabolism. This is not useable in the cell to perform work, but is beneficial to warm-blooded animals in maintaining their temperature.) The problem has some analogies
with a car. If you placed a bucket of petrol under the bonnet and
set it alight, energy would be liberated as heat but this would not
be useful. The free energy released by petrol oxidation in the
cylinders must be coupled to the driving wheels of the car and
not just dissipated as heat. Similarly, the free energy released by
oxidation of food must be coupled to performing useful work
needed by the cell. This raises an interesting problem because
there are several different classes of food molecule to be
oxidized—carbohydrates, fats, and proteins—and there are different uses to which the energy must be coupled; chemical work,
osmotic work, and mechanical work. A flexible strategy has
been adopted: processes releasing free energy from all food
molecules use it to make a single compound, adenosine
triphosphate (ATP), and virtually all processes needing energy
use ATP to supply it. ATP is (with rare exceptions that do not
alter the essential validity of the statement) in effect the universal energy currency of life. To give a simple example, when you
contract a muscle, ATP breaks down to adenosine diphosphate
(ADP) and phosphate. This supplies the requisite energy by
the mechanism described in Chapter 8. The food-breakdown
processes referred to earlier immediately replenish the ATP by
resynthesizing it from ADP and phosphate.
Types of molecule found in living cells
Biological molecules are based on the carbon atom bonded
mainly to hydrogen, oxygen, and nitrogen atoms and to other
carbon atoms. The carbon atom can form four bonds with
Daph-01.qxd
29/10/04
9:02 PM
Page 5
chapter 1: the basic molecular themes of life 5
other atoms, tetrahedrally arranged in the case of single bonds
and this, together with its ability to form C-C bonds, enables the
formation of a wide variety of molecules of different shapes and
properties. Other elements are important in life, including phosphorous and sulphur; several metal ions are also essential to life,
some in trace amounts only. We can divide cellular molecules
into two categories, small molecules and macromolecules.
L-alanine
Water
Small molecules
Water is the most prevalent of the small molecules, constituting
about 70% of a typical cell. The rest are molecules of foodstuffs,
such as sugar, fats, and proteins and their derivatives, which the
cell uses as sources of building blocks to synthesize the cellular
constituents, such as new proteins, membranes, and carbohydrate structures, and to burn as a source of energy.
There is a large variety of these small molecules, but the basic
classes of foodstuffs are carbohydrates, amino acids (in the
form of proteins), and lipids (fats). Carbohydrates include the
sugars such as glucose and sucrose. The name carbohydrate
derives from the fact that they have the empirical formula
CH2O and thus have the elements of carbon and water in equal
proportions. They are important energy stores and participate
in many structural molecules. Amino acids are short carbon
chains with a basic amino group and an acidic carboxyl group.
Their overriding importance is that they are the building blocks
from which proteins are synthesized (see below). Lipids or fats
have various roles, the two most prominent being in the formation of cell membranes and as the major storage of energy in an
animal. The molecular weights of these small molecules are in
the range of a few hundred daltons or less. (A Dalton or Da is a
unit of atomic or molecular mass defined as one-twelfth of the
mass of a carbon 12 atom, approximately equal to the mass of
a hydrogen atom.) Figure 1.2 shows molecular models of water,
L-alanine (a typical amino acid), and stearic acid (a typical
lipid), which has a long hydrocarbon chain.
The macromolecular constituents of cells
Macromolecules are large structures formed by the polymerization of small units, collectively known as monomers. Glycogen,
starch, and cellulose are polymers formed by joining together
glucose units (in a slightly different manner in the three cases).
Glycogen and starch are for energy storage in animals and
plants respectively and cellulose is for structural strength in
plants. Since only glucose monomers are involved in the synthesis of these macromolecules, all that is needed in their synthesis is
a mechanism to link them together. There is no information
content in them.
Stearic acid
Fig. 1.2 Space-filling models of water, the amino acid L-alanine, and
a lipid, stearic acid. The colours of the atoms are: carbon, dark grey;
oxygen, red; hydrogen, pale blue; nitrogen, dark blue. The computer
program that generates the models represents the size of the electron
cloud of atoms, which is affected by the nature of their attached atoms.
In the case of hydrogen with a single electron, the represented size is
greatly reduced when attached to an electrophilic atom such as oxygen
or nitrogen.
Proteins and DNA are different in this respect; they have
information content. These polymers are built up from a variety
of monomers which must be put together in the correct order
and this requires that the cell has instructions available on the
correct sequences for these.
Proteins
The word protein is derived from the Greek meaning ‘primary’;
proteins are of primary importance in life and the reason for
DNA is to make their production possible. They are built up
from a menu of 20 different species of amino acids, a large
number of which are polymerized into long chains, known as
polypeptides (Fig. 1.3). After synthesis, they fold up into threedimensional compact shapes determined by the particular
sequence of amino acids. Figure 1.4 shows a space-filling molecular model of human deoxyhaemoglobin, an average-sized
protein of 574 amino acids and molecular weight of 64 500 Da.
Proteins range in size from the small insulin molecule (molecular weight 5733 Da), which is comprised of 51 amino acids
linked together, to large ones of several thousand amino acids.
Catalysis of reactions by enzyme proteins is central
to the existence of life
Enzymes are catalytic proteins. Thousands of different chemical reactions occur in a living cell even though the conditions
Daph-01.qxd
29/10/04
9:02 PM
Page 6
6 chapter 1: the basic molecular themes of life
Amino acid 1
Amino acid 2
R2
H2N CH COOH + H2N CH COOH
R1
H2O
R2
Dipeptide H2N CH CO HN CH COOH
R1
n Amino acids added one at a
time in the correct sequence
Linear polypeptide chain
3-dimensional folded protein
Fig. 1.3 Outline of protein synthesis. Note that although peptide
synthesis involves overall the removal of a water molecule, the
process in the cell is not a direct condensation. Protein synthesis is
carried out by cellular bodies called ribosomes. The sequence of
amino acids added to form the polypeptide chain is specified by a
molecule of messenger RNA, which is a copy of the base sequence
of the gene coding for the protein.
Fig. 1.4 Space-filling model of haemoglobin. The CPK
(Corey–Pauling–Koltun) colour scheme is used: carbon, light grey;
oxygen, red; nitrogen, blue; sulphur, yellow. The Protein Data Bank
accession code for haemoglobin is 1A3N (see page 85 for instructions
on how to get this picture yourself ).
there are not such that would facilitate chemical reactions:
almost neutral pH, low temperature, no especially reactive substances, and chemicals present in dilute aqueous solution. In the
chemistry laboratory, reactions are commonly brought about
by high temperatures, extreme pH values, and high concentrations of reactants. A sugar such as glucose is stable at body
temperature and left in air in a bowl will undergo no change for
many years. However, if you eat the sugar, in the cells it is
involved in chemical reactions. The reactivity of glucose (and all
else) in the cell is due to enzymes combining with the molecules
and catalysing the reactions. Enzymes are specific protein
catalysts—usually one enzyme, one reaction. Without this
ability of proteins to bind precisely with their target molecules
(enzyme substrates) and catalyse specific reactions, life would be
impossible. Since there are thousands of different reactions in a
cell there are thousands of different enzymes catalysing them
with (usually) one gene specifying each enzyme. They are generally efficient catalysts. One molecule of the enzyme carbonic
anhydrase, important in red blood cells (page 70), catalyses the
conversion of 600 000 molecules of substrate per second.
Proteins are also involved in virtually everything else in cells
and organisms: structures, muscle contraction, nerve impulses,
hormone action, chemical signalling, and regulation of metabolism. They are very versatile, ranging from delicate enzymes
and exquisite molecular machines to the tough proteins of bone
cartilage and of hair and horses’ hooves.
How can one type of molecule do so many tasks? As already
stated, proteins are synthesized from a menu of 20 different amino
acids, ranging in number in different proteins from about 50 to
2000 amino acids, though typically from a few hundred. If we
regard these as an alphabet of 20 letters, proteins are ‘words’several
hundreds of letters in length so that the number of possible different words, or proteins, is infinite. The fossil record shows that
primitive prokaryotes (bacteria) existed on earth 3.5 billion years
ago. The amino acid sequences of thousands of different proteins
existing today have evolved over 3.5 billion years of random
change and natural selection. Continuation of each life form is
dependent on every new cell being given a complete set of instructions on the amino acid sequence for every protein in the cell. It is
a colossal information-storage and -retrieval process. This is the
role of the genes and the mechanisms for reading them.
Each new generation of cells must have a complete set of
genes reproduced by the parent cell so that one copy can be
given to each of the two daughter cells. The latter can then direct
the synthesis of the proteins necessary for the life of the cell. As
will be described later in more detail, there may be one or
several chains of amino acids in a protein. Each chain is
specified by one gene so that the synthesis of a given protein
may require one or several genes accordingly. Also the cell has
Daph-01.qxd
29/10/04
9:02 PM
Page 7
chapter 1: the basic molecular themes of life 7
tricks for producing different versions of proteins from a single
gene (see differential splicing on page 390).
A human being has an estimated 30 000 genes in each cell (the
exact number is not established). The duplication of this vast
mass of information in the form of DNA cannot occur without
some mistakes, the latter being known as gene mutations.
A mutation leads to the production of a protein which is not
exactly correct. Mutations can have a range of effects ranging
from no damage to genetic damage to the offspring or death of
the embryo. There are large numbers of genetic diseases known;
cystic fibrosis is one example (page 117).
Evolution of proteins
There is another side to genetic mutations. Evolution is a process
in which natural selection preserves favourable mutations;
if a mutation in germ cells increases the chance of progeny
reaching reproductive age then that mutation will be preserved.
Deleterious ones are eliminated by natural selection. Since genes
code for proteins it is clear that evolution depends on the synthesis of new proteins which give a selective advantage. The chances
of random changes in the amino acid sequence of a protein being
advantageous are finite but small so that evolution is a slow,
uncertain business. However, since there is no limit to the
potential protein structures that can theoretically exist, evolution
is not limited in the number of changes that can be tried over the
billions of years involved.
Development of new genes
The evolution of proteins requires the development of new genes.
The problem of how you can change an essential gene into a different one without eliminating the function of the original gene
can often be explained by another type of accident in the replication of genes, namely gene duplication in which a given gene is
reproduced twice. One of the genes can be mutated while the
other continues to code for the original essential protein. There is
much evidence in the base sequences of genes (see below) indicating that this is what has happened; sets of related genes exist
which obviously have evolved from common ancestors.
DNA (deoxyribonucleic acid)
It was established in the 1940s that DNA (deoxyribonucleic
acid) is the substance of genes. A complete DNA molecule is
a chromosome, with protein components present as structural
support. The E. coli chromosome has a molecular weight of
12 million daltons and the largest human chromosome several
billion daltons. Individual genes encode the information on the
amino acid sequence of specific polypeptides: one gene, one
polypeptide—thousands of genes, thousands of polypeptides
(and therefore proteins). The DNA of each gene carries the
chemical message which signals to the cell how to assemble the
amino acids in the correct sequence to produce the protein for
which that gene is ‘responsible’. The information is contained
in the sequence of the monomers called nucleotides which
make up DNA. A nucleotide has the structure base–sugar–
phosphate.
There are four different nucleotides in DNA, differing in
the base components, linked together forming a ‘backbone’ of
alternating sugar–phosphate residues with the bases projecting
from the sugar residues. It is the sequence of different bases that
carries the information of the gene.
DNA exists in the form of a double strand held together by
secondary bonds of which hydrogen bonds (described below)
are critical, as illustrated in Fig. 1.5. Two of the four species of
bases in DNA, adenine and thymine (A and T), automatically
Noncovalent (hydrogen)
bonds are critical to holding
the two chains together
C
Phosphate-sugar
backbone
G
G
C
A
T
T
A
C
G
Bases attached
to sugar residues
of nucleotides
Fig. 1.5 Diagram of the structure of double-stranded DNA. The
backbone consists of alternating sugar-phosphate residues to which
the four types of base are attached. The base pairs are always between
G and C or between A and T. Note that each base pair always includes
one larger and one smaller base so that all base pairs are of the same
size. The two chains are held together by noncovalent bonds (page 37);
there are three between G and C and two between A and T. The two
strands are shown as being parallel for clarity but in fact they form
a double helix as shown in the model in Fig. 1.6.
Daph-01.qxd
29/10/04
9:03 PM
Page 8
8 chapter 1: the basic molecular themes of life
Parent DNA
double helix
A
T
A
Incoming
monomers
C T G G
base pair
G A C C
with parent
strands
A
Strands separate
C T C
C T
G G
G
T
G A
A
C
C C
A
G
A
T
C T
G A
G G
C C
A
T
C T
G A
G G
C C
G
Monomers
polymerised into DNA
A
C T
G G
A
C T
G G
T
G A
C C
T
G A
C C
Two identical
double helices
Fig. 1.6 A model of B DNA. Space-filling atomic model of a DNA
segment with one major groove and two minor grooves.
pair together because their shapes are complementary and
hydrogen bonds form between them. The same is true for the
other pair, guanine and cytosine (G and C). This pairing is
known as complementarity or Watson–Crick base pairing,
after its discoverers. It is specific; base pairing in this way occurs
only between G and C, and A and T respectively. The two
strands in a DNA molecule are not parallel as indicated in
Fig. 1.5 for simplicity, but rather wind around each other to
form the well-known double helix, shown more realistically in
the space-filling model of Fig. 1.6.
DNA can direct its own replication
The central requirement of any genetic system is that the
hereditary information can be replicated and passed on to
daughter cells and the reason nucleic acids carry the genetic
information is that they have the capacity to direct their own
replication as well as performing their function of directing
protein synthesis. If the two strands of DNA are separated,
Fig. 1.7 Principle of DNA replication. The two strands of the double
helix are held together by hydrogen bonds between bases A and T, and
G and C respectively. When the strands are separated the single
strands are now available for base pairing by incoming monomer
nucleotides. The nucleotides thus lined up are linked together to
give two identical daughter double helices. Exactly the same
self-replicating principle applies to RNA replication and to the
transcription (copying) of DNA into RNA which occurs in the production
of messenger RNA from a gene. Note that each of the replicated double
helices contains one parental strand and one newly synthesized one.
Note also that the actual mechanism occurs with the incoming
nucleotides pairing up in the active site of the DNA polymerase
enzyme attached to the template; for simplicity the DNA polymerase
is not shown here.
the base-pairing potentiality is exposed. The enzyme which
synthesizes DNA moves along each strand, linking together
nucleotides in the sequence specified by the strand being
copied (known as a template strand). An A on the template
strand matches a T on the new strand, G is matched to a C and
vice versa. Figure 1.7 illustrates the principle of this, but note
that for illustrative purposes the incoming monomers are
shown lined up while in reality the enzyme is involved in their
correct placing also (Chapter 23). Since both strands are read
in this way, we end up with two new double helices identical to
the original one. This style of replication of DNA is known as
semi-conservative; each new double helix contains an old
strand from the parent DNA molecule and one newly synthesized one. The linking together of the monomers requires
energy; this is supplied indirectly from ATP by the mechanism
discussed in Chapter 23.
Daph-01.qxd
29/10/04
9:03 PM
Page 9
chapter 1: the basic molecular themes of life 9
What is the nature of the genetic information in DNA? It is in
the form of the coded sequence for amino acids in proteins.
A triplet of three bases on a DNA strand specifies each amino
acid in a protein. With four different bases, 64 different triplet
combinations are possible (4 4 4). Since only 20 species of
amino acids are involved, there is plenty of coding ability for
protein synthesis and room for full stops to signify the end
of the message. To code for large numbers of proteins, each of
which may have hundreds of amino acids, the DNA has to be of
prodigious length. Each human cell (105 m in diameter) has
about 2 m of DNA (you might like to work out the total length
of DNA in your body given that there are about 1013 cells). It is a
very narrow thread and is greatly compacted in the nucleus. The
DNA of the human genome (the complete collection of
chromosomes) contains 3.2 billion nucleotide pairs. The complete
sequencing of these has been achieved by the Human Genome
Project, although understanding of the organization of the
genome and of much of its DNA is incomplete.
Genes are part of the continuous chromosomal DNA
molecule. Each gene is distinct from the next, separated by spacer
sequences between the genes of no known function. Proteins are
synthesized on cellular structures known as ribosomes. These
take instructions (indirectly) from the gene. To instruct the ribosomes on the amino acid sequence of a specific protein, each gene
is independently copied into a different nucleic acid called messenger RNA (mRNA). It delivers the message of coded instructions from the gene to the ribosome. RNA has almost the same
structure as a single strand of DNA with small chemical differences in the monomers. T is replaced by U, for uracil, which lacks
a methyl group present in T (the details are not important at this
stage), and the sugar is ribose with an extra oxygen atom, rather
than deoxyribose. The bases in RNA have the same base-pairing
properties as those in DNA; in RNA, U pairs with A. Only one of
the two strands of DNA in a gene is copied into RNA. The
sequence of information flow is as shown below.
( RNA monomers)
( Amino acids)
↓
↓
DNA of gene 1 → m RNA 1 → polypeptide 1 → folded protein 1
DNA of gene 2 → m RNA 2 → polypeptide 2 → folded protein 2
How can a linear one-dimensional sequence of information in
genes give rise to the three-dimensional structure of a protein in a
living organism? This is where the folding of the linear polypeptide
comes in. An unfolded polypeptide is, with rare exceptions, not
biologically functional. In the cell, when a ribosome synthesizes a
polypeptide, it folds up into the correct configuration in a few
minutes. Proteins are complex three-dimensional structures
formed by folding into the correct configuration, as specified by
the amino acid sequence in the polypeptide. To a considerable
extent, the folding is determined by hydrophobic (water-hating)
amino acids being placed mainly on the inside of the molecule
away from water, and the hydrophilic (water-loving) ones on the
surface. Only after folding into their three-dimensional configurations do they perform their roles in living organisms. (Have
another look at the folded haemoglobin molecule in Fig. 1.4.)
This is the basis of how the one-dimensional linear information present in DNA specifies the formation of threedimensional organisms, since the folded proteins can assemble
into larger living structures; yet another of the profound
concepts of life.
Junk DNA
What we have outlined above has been the accepted dogma for
about half a century. Genes are transcribed into mRNAs which
code for proteins and proteins determine the heritable characteristics of organisms. None of this is factually challenged but
very recently it has become evident that our concept of genetic
inheritance is not the full story. There were a few facts, which
while not contradicting the accepted gene concepts, seemed a
little odd. First it had become apparent with the completion of
the human genome project that biological complexity is not
proportional to gene numbers. The rice plant has more genes
than does a human. A nematode worm has 18 000 genes and the
more sophisticated fruit fly only 13 000. Another oddity is that,
in a human, the DNA sequences that actually code for protein
sequences amount to only 1.5% of the total. The rest was
dismissed as junk DNA without any informational content—
much of it useless garbage collected by eukaryotes during
evolution and which for some reason could not be got rid of.
(Prokaryotes have little or no junk DNA.)
There is now evidence that in fact junk DNA contains
large numbers of noncoding microgenes which have been
conserved over long periods of evolution. They code for tiny
micro RNAs which are not for protein-coding purposes. What
are they for?
It is too early for this to be known fully but they appear
to be responsible for some of the inherited characteristics of
organisms. Possibly in some way they regulate the pattern
of expression of the collection of protein-coding genes (the
expression of a gene means it giving rise to the synthesis of
a protein). This, it is thought, might be partly responsible
for the complexity of a human being disproportionate to the
number of conventional genes. Elimination of a microgene
has been shown to cause dramatic changes in the structure of
Daph-01.qxd
29/10/04
9:03 PM
Page 10
10 chapter 1: the basic molecular themes of life
a plant. We discuss the human genome in Chapter 22 and on
page 353 one mechanism by which certain small RNA molecules can interfere with gene expression. The further reading list
at the end of this chapter gives some short reviews on this
development. This is yet another of the ‘hot’ research areas of
molecular biology.
Molecular recognition by proteins
We have described the role of proteins as catalytic molecules or
enzymes. Proteins have the ability to bind to other molecules,
which often are also proteins, in a completely specific manner.
They ‘recognize’ the molecule they are ‘designed’ by evolution to
bind to. Life is completely dependent on this. Protein molecules
associate to form complexes ranging in size from dimers to
molecular complexes containing large numbers of protein
subunits forming larger cellular structures.
However, specific protein interactions go far beyond this.
Hormones and growth factors deliver signals to cells by
combining with specific protein molecules known as receptors
displayed on the outside of cells. The cell-signalling system
of the body (Chapter 27) depends on it. Enzymes recognize
their substrate molecule(s); gene regulation depends on control
molecules recognizing a sequence of a few nucleotides among
billions on a chromosome to give but two examples. Life is
dependent on specific protein attachments to other molecules.
There is another requirement for molecular recognition.
The attachments must often be easily reversible. An enzyme
must release the products of the reaction it catalyses; genecontrol proteins must detach when it is appropriate to do so,
for without this, the activation of a gene would be irreversible,
whereas many genes need to be switched on and off. How is
this molecular-recognition system achieved? The answer is the
way several weak chemical bonds between matching surfaces
add up to a sufficiently strong but reversible attachment of the
molecules.
Noncovalent or weak chemical bonds
Noncovalent bonds (page 37) are electrostatic attractions
between positively and negatively charged atoms, and are much
weaker than covalent ones.
The strongest noncovalent bonds are ionic bonds such as
between ions, the next strongest are hydrogen bonds dependent
on partial atomic charges, and the weakest are van der Waals
forces, which may be between any two atoms appropriately positioned (Chapter 3). A single noncovalent bond would be insufficient to hold two molecules together; a group of them is needed.
Atoms need to be sufficiently close for them to form in sufficient
numbers. The protein and its ligand (as the binding molecule is
called) must therefore be complementary in shape and with
chemical structures suitable for forming noncovalent bonds
between molecules at the specific patches on the protein surface
in contact with the ligand (the entire protein surface is not
usually involved in associations). Because proteins have unique
structures, individual proteins can evolve to be specific for
recognizing particular molecules with which to bind. This
is the basis of biological specificity. It is difficult to think of a
living process that does not require structural complementarity
between specific proteins and other molecules.
Noncovalent bonds form spontaneously without the need
for enzyme catalysis. They also are broken easily, which gives
them the required degree of reversibility referred to earlier.
If there are large numbers of weak bonds then molecules
can be bound together almost irreversibly, such as is the case
in antibody–antigen reactions (page 511). The requirement
for easy reversibility is clear in the replication of DNA when
the two strands have to be separated to expose the basepairing potentiality of the bases. There are cellular mechanisms for breaking of noncovalent associations in DNA as
required.
In addition to the protein molecular recognition we have
described, weak bonds play an important part in the folded
structure of proteins. Protein molecules are better regarded as
molecular machines which need to be flexible in their configuration rather than as rigid unchanging structures. The use of
weak bonds in their three-dimensional structures confers this
flexibility.
How did it all start?
Living organisms consist of one or more cells (Chapter 2). Each
cell is surrounded by a cell membrane, a thin sheet composed
mainly of lipid (fatty) molecules which is necessary to hold the
contents of the cells together.
The origin of the first cell is necessarily speculative but at
some time in the establishment of life there must have been a
primordial self-replicating molecular system from which living
cells developed. Hypotheses have been formulated of how selfreplicating systems might have been established on a mineral
Daph-01.qxd
29/10/04
9:03 PM
Page 11
chapter 1: the basic molecular themes of life 11
Polar (hydrophilic) head groups
Aqueous interior
Hydrocarbon (hydrophobic) layer
Polar or
hydrophilic
head group
Nonpolar or
hydrophobic
tails
Fig. 1.9 An amphipathic molecule of the type found in cell membranes.
surface or in a drop of liquid or sea pool, but, at an early stage,
it had to be contained by a membrane. Otherwise it presumably would have been dispersed. A striking fact is that when
molecules of a suitable substance are agitated in water they
form small spherical vesicles (liposomes). (The type of lipids
found in egg yolk are examples. Their structures are given in
Chapter 7 on membranes.) The boundary of these vesicles is
made of a structure known as a lipid bilayer, which is virtually
identical to the basic structure found in the membranes of
modern cells (Fig. 1.8). Such vesicles may have enclosed a drop
of the first self-replicating system. From such a primordial celllike structure all life is postulated to have originated. The
requirements for a molecule to be capable of forming a lipid
bilayer are not demanding; it needs to have amphoteric properties, by which we mean one part of a molecule is water-insoluble
(hydrophobic) and the other water-soluble (hydrophilic), and
of a roughly suitable shape as illustrated in Fig. 1.9.
What was the source of the molecular building blocks needed
to produce the components of living cells? Experiments have
been done in which electrical discharges were passed through
Fig. 1.8 A synthetic
liposome made of a lipid
bilayer structure.
a mixture of gases (hydrogen, methane, ammonia, and CO2, in
the presence of water) intended to resemble the atmosphere of
the primitive Earth. A mixture of potential precursors of
biomolecules including some amino acids was produced.
The postulated primordial self-replicating cell must have taken
in molecules from the environment to produce new cellular
material. Diffusion through the containing membrane before
the development of transport mechanisms would have been
slow, and replication likewise slow, but vast time scales were
involved.
The RNA world
A more difficult problem in the establishment of a self-replicating
system is to identify the initial catalysts and the primitive ‘genetic
system’ to ensure faithful replication. In short, a chicken-and-egg
problem; which came first, proteins to catalyse reactions or
nucleic acids to direct the synthesis of primitive proteins?
This dilemma received a possible answer with the discovery that
RNA can catalyse some chemical reactions including conversion of short polynucleotides into longer sequences. Such molecules were given the name of ‘ribozymes’ (not to be confused
with ribosomes). It was the first time that biological molecules
other than proteins had been found to catalyse specific reactions.
RNA has the same potentiality for acting as a template in its
own replication as explained for DNA. In short RNA may
have been both the catalyst and the primitive ‘genetic system’
for self-replication in the origin of life, thus avoiding the
chicken-and-egg dilemma. It may be speculated that the first
short polynucleotides were formed from nucleotide monomers
by heat chemically condensing the nucleotides together by
driving off water.
Daph-01.qxd
29/10/04
9:03 PM
Page 12
12 chapter 1: the basic molecular themes of life
From this stage, evolution of more efficient catalysts,
namely proteins, to replace RNA catalysts is postulated to
have occurred, though the first ‘proteins’ must have been
primitive and presumably were short peptides of low catalytic
efficiency. The concept of an RNA-based biological world
that preceded the DNA world is generally accepted for there
is much supporting evidence. In modern cells, although protein enzymes bring about almost all catalysed reactions, the
displacement of RNA from this role is not complete. What
might be regarded as a few fossil catalysts—hangovers from the
RNA world—exist in cells as ribozymes. One of these in ribosomes is involved in the synthesis of all proteins, providing an
interesting link between one type of catalytic system (RNA) and
a more efficient one (proteins). Ribosomes are giving us
a glimpse into the ancient RNA world, somewhat akin to
astronomers viewing the past universe through long-distance
telescopes.
Why has DNA superseded RNA as the medium for storing
genetic information in all cells? The answer almost certainly is
that DNA is chemically more stable than RNA. If a mistake is
made in the synthesis of a DNA molecule, or it is damaged in
some way, enzymes exist to repair it (Chapter 23). RNA is still
the genetic material of many viruses. RNA damage is not
repaired (as occurs with DNA) and RNA viruses therefore
mutate rapidly. By constantly changing the proteins which the
immune system recognizes (Chapter 29), new viral strains
escape immune attack. So primitive molecular instability, coupled with lack of repair, is an advantage even in the modern
world where most viruses are in fact RNA ones: human
immunodeficiency virus (HIV), influenza, poliomyelitis,
mumps, foot and mouth, measles, and rubella viruses to name a
familiar few. The same applies to plant viruses.
The new ‘omics’ phase of biochemistry
and molecular biology
From what has been said in this chapter, it will be clear that
sequences of amino acids in proteins and those of the nucleotides
in DNA underlie just about everything in life. As these sequences
were determined, it was realized that the flood of molecular
information would be of little avail without an efficient retrieval
system. To this end, in a remarkable example of international
collaboration, protein and DNA computer databases were established in various centres around the world in which information
on proteins and genes is recorded. Details of the sequences of
thousands of genes and proteins together with the threedimensional structures of many of the latter are now available.
Software in the public domain is available to search the databases
and analyse the information contained in them. This area of
science is known as bioinformatics, which has become of
immense importance in biochemistry and molecular biology.
Parallel to this there have been developments of methods
for the automatic sequencing of DNA that have resulted in
the completion of the human genome project, which has
determined the nucleotide sequence of the entire human DNA
(known collectively as a genome). The sequencing of the
genomes of other species such as those of the mouse, the rice
plant, and Drosophila, the fruit fly, are also complete to cite
only a few. Another important method, ‘DNA chip’ or DNA
microarrays, allows the simultaneous study of the transcription
(copying) of large numbers of genes by detecting which are
actively giving rise to mRNA (Chapter 28). In the protein
field, the relatively recent application of mass spectrometry
(Chapter 5) to proteins (a development of immense importance) makes it feasible to investigate many proteins at once.
These developments are sometimes referred to informally as
the ‘omics revolution’, which needs an explanation. The entire
collection of proteins in a cell (in any one state, for it varies from
time to time) is called the proteome and that of genes, the
genome. The collective studies of these are called proteomics
and genomics respectively. An apt analogy has been put forward to illustrate the meaning of these collective terms to the
effect that many of the instruments in an orchestra (genes,
proteins, and metabolites) have been identified. The next stage
is to listen to the music the orchestra plays with them. In other
words, the proteins and genes in a cell function as a collective whole and a full understanding of life and abnormalities
will need to consider them as such and to understand their
interactions.
The collective study of the copying of genes into mRNA
(transcription) using DNA microarrays is now sometimes
referred to as ‘transcriptomics’, and the term ‘metabolomics’ is
used to describe the complement of low-molecular-mass molecules (metabolites) present in specific cells and at specific times.
These ‘omics’ studies make it possible to ask what genes are
active and what proteins and products are present, say, in cancer
cells as compared with neighbouring normal cells. The medical
potential is very great and applies with equal force to plant studies and its potential in agriculture. The great potential for medical intervention in treatment of diseases based on a molecular
understanding of life has given rise to the biotechnology boom.
Daph-01.qxd
29/10/04
9:03 PM
Page 13
chapter 1: the basic molecular themes of life 13
■ SUMMARY
Unity of life. Despite the diversity of life forms, at the molecular
level all life is basically the same, variations being modifications
of the same theme. It suggests a single origin of life.
Living cells obey the laws of physics and chemistry. Energy
is derived from breaking down food molecules (ultimately
produced by plants using sunlight energy). The energy must be
released in a form which can drive chemical and other work. Heat
cannot do work in the cell.
ATP (adenosine triphosphate) is the universal energy currency
in life. The energy is used to synthesize ATP from ADP (adenosine
diphosphate) and phosphate; ATP breakdown is coupled to
biochemical work.
Molecules found in living cells. These include small molecules
such as water, food molecules, and their breakdown products.
Macromolecules, among which proteins and DNA are preeminent, are large molecules formed by polymerization of
smaller units.
Proteins. These are the cell’s workhorses and the basis of
most living structures. They are long chains of amino acids, typically hundreds long, but folded up into a precise three-dimensional structure. There are 20 different amino acids in proteins;
each protein is a unique sequence of these.
Enzyme catalysis. Enzymes are proteins which catalyse
virtually all the thousands of chemical reactions of life. One
enzyme, one reaction. Relatively recently, however, it has
been discovered that RNA (ribonucleic acid) can have catalytic
activity.
DNA (deoxyribonucleic acid). The cell must have a blueprint of
the sequence of each of the thousands of proteins it synthesizes.
This is the function of DNA in the form of genes, each gene
specifying the amino sequence of one polypeptide.
DNA consists of two strands of polynucleotides in a double
helix. A nucleotide has the structure base—sugar—phosphate.
The bases are paired by hydrogen bonds, the base A linked to T,
and G to C. This automatic pairing is the basis of self-directed
replication. The base sequences act as a code, specifying individual amino acids; three bases, known as a codon, representing
an amino acid. The genetic code is the table correlating codons
to the amino acids that they specify.
Ribosomes translate the base sequences of genes into
proteins. The mechanism of this is that each gene is copied
into messenger RNA (a polymer resembling DNA) which attaches
to and instructs a ribosome. Ribosomes have no specificity for
the proteins they synthesize; they produce the protein specified
by the messenger just as a tape player plays whatever music is
specified by the tape.
Evolution of genes and proteins. DNA is the record of
sequences needed to synthesize proteins, the information
having been acquired by billions of years of evolution. Mistakes
in replicating DNA inevitably occur. These are mutations which
result in faulty amino acid sequences in proteins, which may in
turn result in genetic diseases. The random mutations are also
the material on which evolution, via natural selection, develops
new genes.
Molecular recognition by proteins. Apart from recognition of
substrates by enzymes, proteins recognize (bind to) other molecules such as hormones and growth factors, thus directing development, growth, and metabolic processes. The binding is by
multiple weak bonds whose formation depends on atoms being
close enough for the bonds to form. This means that only molecules closely complementary to one another bind. It is the basis
of biological specificity. The use of weak bonds in molecular
recognition confers flexibility and reversibility.
How did it all start? An RNA world is believed to have preceded DNA and proteins. Life presumably must have originated by the spontaneous formation of a molecule capable of
self-replication without the aid of proteins. It is generally
believed that life originated with RNA which has the information
to direct its own replication. The RNA world is still seen in
the genes of some viruses, and in all cells in the form of
ribosomes which have a high content of RNA. DNA replaced
RNA because it is a chemically more stable repository of genetic
information.
The new ‘omics’ phase of biochemistry and molecular biology.
In the past decade an explosion of new technologies has
revolutionized biochemistry and molecular biology. Prominent
among these are automated DNA sequencing, mass spectrometry
for the study of proteins, and DNA microarrays for gene studies.
They are having enormous effects on biological science,
medicine, and agriculture. The branches of science utilizing these
are described as proteomics, genomics, and metabolomics,
which are collective terms to specify that large numbers of
proteins, genes, and metabolites, respectively, can be examined
together.
Daph-01.qxd
29/10/04
9:03 PM
Page 14
14 chapter 1: the basic molecular themes of life
■ FURTHER READING
Cech, T. R. (1986). A model for the RNA-catalysed replication of
RNA. Proc. Natl. Acad. Sci. U.S.A., 83, 4360–3.
Describes the formation of polycytidylate.
Gilbert, W. (1986). The RNA world. Nature, 319, 618.
Joyce, G. F. (1989). RNA evolution and the origins of life.
Nature, 338, 217–24.
Orgel, L. E. (1994). The origin of life on earth. Sci. Am., 271(4),
52–61.
Growing evidence supports the idea that the emergence of
catalytic RNA was a crucial early step.
Lafcano, A. and Miller, S. L. (1996). The origin and early
evolution of life: prebiotic chemistry, the pre-RNA world and
time. Cell, 85, 793–8.
Junk DNA and microRNA genes
Gibbs, W. W. (2003). Hidden genes. Sci. Am., 289, 28–33.
Mattick, J. S. (2003). Challenging the dogma: the hidden layer of
nonprotein-coding RNAs in complex organisms. BioEssays, 25,
930–9.
Download