Daph-01.qxd 29/10/04 9:02 PM Page 3 Chapter 1 The basic molecular themes of life This chapter aims at conveying an appreciation of the consistency of the way in which all life is based on a number of basic molecular themes, the tremendous diversity of life forms being variations on them. The molecular nature of life, with the seemingly never-ending succession of discoveries of almost incredible chemical mechanisms, is exciting. The subject is of tremendous and ever-increasing importance in medicine, agriculture, and all aspects of biology. Biochemistry and molecular biology are the scientific disciplines that aim to understand life in molecular terms. Biochemistry is the name for the earliest-studied aspects of the subject in which the metabolism of food and small molecules was a principle focus. Molecular biology was the name given later to the study of biological macromolecules, particularly proteins and DNA, and the genetic mechanism. The distinction between biochemistry and molecular biology is blurred but the terms are convenient, if imprecise, labels. Many biochemistry departments in universities and biochemical societies have added ‘Molecular Biology’ to their titles. Unity of life at the molecular level Given the diversity of life forms, it might be thought that biochemistry and molecular biology must be a diffuse subject, but life at the molecular level is remarkably similar (biochemists and molecular biologists are dying to know if this will apply to life on Mars or wherever it might be discovered). A famous dictum of the French Nobel Prize winner, Jacques Monod, is that ‘what holds for the Coli bacterium is true for an elephant’, meaning that the similarities between a bacterium living in the human gut and an elephant exceed the differences when viewed at the molecular level. Life is currently presumed to have had a single origin and once the primordial form of a self-replicating living cell was developed, many of the fundamental biochemical processes had already been established and life was locked into these. As diversity of life forms evolved, the chemistry of cells changed to cope with new needs, but the underlying basis of life remained much the same. This explains why, in biochemical research, a variety of organisms is often used to elucidate a given biochemical process. To understand how a process works, for example in humans, the best strategy may be to study the bacterium, Escherichia coli, or a virus, where the basic information might be more easily obtained. E. coli is the most intensively studied cell. Relative simplicity and rapid growth make it a favourite for studies on the genetic mechanism. There are differences in molecular processes between bacterial and human cells but they are more matters of detail rather than of principle. Biochemical knowledge is applicable to all life forms. Living cells obey the laws of physics and chemistry: the energy cycle in life To grow and reproduce, cells take in simple molecules from the external medium and build them up into organized complex molecules. The synthesis of complex molecules from simpler ones involves an increase in energy, so chemical work must be done. A living cell is at a higher energy level than the random collection of molecules in its external environment from which Daph-01.qxd 29/10/04 9:02 PM Page 4 4 chapter 1: the basic molecular themes of life it is produced. It is far from being in thermodynamic (energetic) equilibrium with its surroundings; this is achieved only by decomposition after the death of the cell. The first law of thermodynamics states that energy can be neither created nor destroyed—the total energy content of the universe remains constant. However, energy can be transformed from one state to another. Familiar examples are kinetic energy converted into heat and heat into electricity as in a power station. The energy needed to drive all aspects of living systems is derived from chemical energy in the form of food or, in the case of photosynthetic organisms, from sunlight, which is the ultimate energy source for all food. (A minor exception to this statement are organisms living on chemical compounds from the Earth’s crust, such as extremophiles living on H2S near hydrothermal vents in the ocean floor.) Food is taken in by organisms where it is oxidized back to CO2 and water and the energy so released is used to drive all the reactions of a living cell. This is summarized in the energy cycle of life shown in Fig. 1.1. The second law of thermodynamics states that all processes increase the total entropy of the universe, the ultimate end seemingly being an inert, dark, cold universe of infinite entropy in which everything is uniformly scattered. Entropy is the degree of randomness (disorder) or, as Willard Gibbs put it in more homely terms, the degree of mixed-upness. A low-entropy system is at a higher energy level than a similar system of high entropy. A living cell takes in simple molecules from the environment (high entropy) and converts them into the organized structure of the cell (low entropy). This might look as if the cell flouts the second law, the explanation being that in releasing the required energy from the breakdown of food, entropy is increased by heat and CO2 formation more than it is reduced by Food molecules Energy scale O2 Large organized molecules of the cell Chemical energy Light energy CO2+H 2O Photosynthesis Catabolism Small disorganized molecules from the environment Anabolism Fig. 1.1 The energy cycle in life. Catabolism is the breakdown of complex molecules releasing energy in the cell. Anabolism is the energy-requiring transformation of simple molecules into more complex ones. assembly of the cell. The result is that in the complete system (cell environment), the entropy increases and the second law is obeyed. The fraction of the total energy released by a reaction capable of performing work in the cell is known as ‘free energy’—free meaning available for work, not free as in something for nothing. ATP (adenosine triphosphate) is the universal energy currency in life In what form is energy derived from the oxidation of foodstuffs used by the cell? The sugar glucose will burn if thrown onto a fire and energy will be released in the form of heat. Cells burn glucose in the sense that it is oxidized to CO2 and water, but in life processes the free energy released by the oxidation of food must be harnessed to be usable. (Some heat is liberated during the oxidative metabolism. This is not useable in the cell to perform work, but is beneficial to warm-blooded animals in maintaining their temperature.) The problem has some analogies with a car. If you placed a bucket of petrol under the bonnet and set it alight, energy would be liberated as heat but this would not be useful. The free energy released by petrol oxidation in the cylinders must be coupled to the driving wheels of the car and not just dissipated as heat. Similarly, the free energy released by oxidation of food must be coupled to performing useful work needed by the cell. This raises an interesting problem because there are several different classes of food molecule to be oxidized—carbohydrates, fats, and proteins—and there are different uses to which the energy must be coupled; chemical work, osmotic work, and mechanical work. A flexible strategy has been adopted: processes releasing free energy from all food molecules use it to make a single compound, adenosine triphosphate (ATP), and virtually all processes needing energy use ATP to supply it. ATP is (with rare exceptions that do not alter the essential validity of the statement) in effect the universal energy currency of life. To give a simple example, when you contract a muscle, ATP breaks down to adenosine diphosphate (ADP) and phosphate. This supplies the requisite energy by the mechanism described in Chapter 8. The food-breakdown processes referred to earlier immediately replenish the ATP by resynthesizing it from ADP and phosphate. Types of molecule found in living cells Biological molecules are based on the carbon atom bonded mainly to hydrogen, oxygen, and nitrogen atoms and to other carbon atoms. The carbon atom can form four bonds with Daph-01.qxd 29/10/04 9:02 PM Page 5 chapter 1: the basic molecular themes of life 5 other atoms, tetrahedrally arranged in the case of single bonds and this, together with its ability to form C-C bonds, enables the formation of a wide variety of molecules of different shapes and properties. Other elements are important in life, including phosphorous and sulphur; several metal ions are also essential to life, some in trace amounts only. We can divide cellular molecules into two categories, small molecules and macromolecules. L-alanine Water Small molecules Water is the most prevalent of the small molecules, constituting about 70% of a typical cell. The rest are molecules of foodstuffs, such as sugar, fats, and proteins and their derivatives, which the cell uses as sources of building blocks to synthesize the cellular constituents, such as new proteins, membranes, and carbohydrate structures, and to burn as a source of energy. There is a large variety of these small molecules, but the basic classes of foodstuffs are carbohydrates, amino acids (in the form of proteins), and lipids (fats). Carbohydrates include the sugars such as glucose and sucrose. The name carbohydrate derives from the fact that they have the empirical formula CH2O and thus have the elements of carbon and water in equal proportions. They are important energy stores and participate in many structural molecules. Amino acids are short carbon chains with a basic amino group and an acidic carboxyl group. Their overriding importance is that they are the building blocks from which proteins are synthesized (see below). Lipids or fats have various roles, the two most prominent being in the formation of cell membranes and as the major storage of energy in an animal. The molecular weights of these small molecules are in the range of a few hundred daltons or less. (A Dalton or Da is a unit of atomic or molecular mass defined as one-twelfth of the mass of a carbon 12 atom, approximately equal to the mass of a hydrogen atom.) Figure 1.2 shows molecular models of water, L-alanine (a typical amino acid), and stearic acid (a typical lipid), which has a long hydrocarbon chain. The macromolecular constituents of cells Macromolecules are large structures formed by the polymerization of small units, collectively known as monomers. Glycogen, starch, and cellulose are polymers formed by joining together glucose units (in a slightly different manner in the three cases). Glycogen and starch are for energy storage in animals and plants respectively and cellulose is for structural strength in plants. Since only glucose monomers are involved in the synthesis of these macromolecules, all that is needed in their synthesis is a mechanism to link them together. There is no information content in them. Stearic acid Fig. 1.2 Space-filling models of water, the amino acid L-alanine, and a lipid, stearic acid. The colours of the atoms are: carbon, dark grey; oxygen, red; hydrogen, pale blue; nitrogen, dark blue. The computer program that generates the models represents the size of the electron cloud of atoms, which is affected by the nature of their attached atoms. In the case of hydrogen with a single electron, the represented size is greatly reduced when attached to an electrophilic atom such as oxygen or nitrogen. Proteins and DNA are different in this respect; they have information content. These polymers are built up from a variety of monomers which must be put together in the correct order and this requires that the cell has instructions available on the correct sequences for these. Proteins The word protein is derived from the Greek meaning ‘primary’; proteins are of primary importance in life and the reason for DNA is to make their production possible. They are built up from a menu of 20 different species of amino acids, a large number of which are polymerized into long chains, known as polypeptides (Fig. 1.3). After synthesis, they fold up into threedimensional compact shapes determined by the particular sequence of amino acids. Figure 1.4 shows a space-filling molecular model of human deoxyhaemoglobin, an average-sized protein of 574 amino acids and molecular weight of 64 500 Da. Proteins range in size from the small insulin molecule (molecular weight 5733 Da), which is comprised of 51 amino acids linked together, to large ones of several thousand amino acids. Catalysis of reactions by enzyme proteins is central to the existence of life Enzymes are catalytic proteins. Thousands of different chemical reactions occur in a living cell even though the conditions Daph-01.qxd 29/10/04 9:02 PM Page 6 6 chapter 1: the basic molecular themes of life Amino acid 1 Amino acid 2 R2 H2N CH COOH + H2N CH COOH R1 H2O R2 Dipeptide H2N CH CO HN CH COOH R1 n Amino acids added one at a time in the correct sequence Linear polypeptide chain 3-dimensional folded protein Fig. 1.3 Outline of protein synthesis. Note that although peptide synthesis involves overall the removal of a water molecule, the process in the cell is not a direct condensation. Protein synthesis is carried out by cellular bodies called ribosomes. The sequence of amino acids added to form the polypeptide chain is specified by a molecule of messenger RNA, which is a copy of the base sequence of the gene coding for the protein. Fig. 1.4 Space-filling model of haemoglobin. The CPK (Corey–Pauling–Koltun) colour scheme is used: carbon, light grey; oxygen, red; nitrogen, blue; sulphur, yellow. The Protein Data Bank accession code for haemoglobin is 1A3N (see page 85 for instructions on how to get this picture yourself ). there are not such that would facilitate chemical reactions: almost neutral pH, low temperature, no especially reactive substances, and chemicals present in dilute aqueous solution. In the chemistry laboratory, reactions are commonly brought about by high temperatures, extreme pH values, and high concentrations of reactants. A sugar such as glucose is stable at body temperature and left in air in a bowl will undergo no change for many years. However, if you eat the sugar, in the cells it is involved in chemical reactions. The reactivity of glucose (and all else) in the cell is due to enzymes combining with the molecules and catalysing the reactions. Enzymes are specific protein catalysts—usually one enzyme, one reaction. Without this ability of proteins to bind precisely with their target molecules (enzyme substrates) and catalyse specific reactions, life would be impossible. Since there are thousands of different reactions in a cell there are thousands of different enzymes catalysing them with (usually) one gene specifying each enzyme. They are generally efficient catalysts. One molecule of the enzyme carbonic anhydrase, important in red blood cells (page 70), catalyses the conversion of 600 000 molecules of substrate per second. Proteins are also involved in virtually everything else in cells and organisms: structures, muscle contraction, nerve impulses, hormone action, chemical signalling, and regulation of metabolism. They are very versatile, ranging from delicate enzymes and exquisite molecular machines to the tough proteins of bone cartilage and of hair and horses’ hooves. How can one type of molecule do so many tasks? As already stated, proteins are synthesized from a menu of 20 different amino acids, ranging in number in different proteins from about 50 to 2000 amino acids, though typically from a few hundred. If we regard these as an alphabet of 20 letters, proteins are ‘words’several hundreds of letters in length so that the number of possible different words, or proteins, is infinite. The fossil record shows that primitive prokaryotes (bacteria) existed on earth 3.5 billion years ago. The amino acid sequences of thousands of different proteins existing today have evolved over 3.5 billion years of random change and natural selection. Continuation of each life form is dependent on every new cell being given a complete set of instructions on the amino acid sequence for every protein in the cell. It is a colossal information-storage and -retrieval process. This is the role of the genes and the mechanisms for reading them. Each new generation of cells must have a complete set of genes reproduced by the parent cell so that one copy can be given to each of the two daughter cells. The latter can then direct the synthesis of the proteins necessary for the life of the cell. As will be described later in more detail, there may be one or several chains of amino acids in a protein. Each chain is specified by one gene so that the synthesis of a given protein may require one or several genes accordingly. Also the cell has Daph-01.qxd 29/10/04 9:02 PM Page 7 chapter 1: the basic molecular themes of life 7 tricks for producing different versions of proteins from a single gene (see differential splicing on page 390). A human being has an estimated 30 000 genes in each cell (the exact number is not established). The duplication of this vast mass of information in the form of DNA cannot occur without some mistakes, the latter being known as gene mutations. A mutation leads to the production of a protein which is not exactly correct. Mutations can have a range of effects ranging from no damage to genetic damage to the offspring or death of the embryo. There are large numbers of genetic diseases known; cystic fibrosis is one example (page 117). Evolution of proteins There is another side to genetic mutations. Evolution is a process in which natural selection preserves favourable mutations; if a mutation in germ cells increases the chance of progeny reaching reproductive age then that mutation will be preserved. Deleterious ones are eliminated by natural selection. Since genes code for proteins it is clear that evolution depends on the synthesis of new proteins which give a selective advantage. The chances of random changes in the amino acid sequence of a protein being advantageous are finite but small so that evolution is a slow, uncertain business. However, since there is no limit to the potential protein structures that can theoretically exist, evolution is not limited in the number of changes that can be tried over the billions of years involved. Development of new genes The evolution of proteins requires the development of new genes. The problem of how you can change an essential gene into a different one without eliminating the function of the original gene can often be explained by another type of accident in the replication of genes, namely gene duplication in which a given gene is reproduced twice. One of the genes can be mutated while the other continues to code for the original essential protein. There is much evidence in the base sequences of genes (see below) indicating that this is what has happened; sets of related genes exist which obviously have evolved from common ancestors. DNA (deoxyribonucleic acid) It was established in the 1940s that DNA (deoxyribonucleic acid) is the substance of genes. A complete DNA molecule is a chromosome, with protein components present as structural support. The E. coli chromosome has a molecular weight of 12 million daltons and the largest human chromosome several billion daltons. Individual genes encode the information on the amino acid sequence of specific polypeptides: one gene, one polypeptide—thousands of genes, thousands of polypeptides (and therefore proteins). The DNA of each gene carries the chemical message which signals to the cell how to assemble the amino acids in the correct sequence to produce the protein for which that gene is ‘responsible’. The information is contained in the sequence of the monomers called nucleotides which make up DNA. A nucleotide has the structure base–sugar– phosphate. There are four different nucleotides in DNA, differing in the base components, linked together forming a ‘backbone’ of alternating sugar–phosphate residues with the bases projecting from the sugar residues. It is the sequence of different bases that carries the information of the gene. DNA exists in the form of a double strand held together by secondary bonds of which hydrogen bonds (described below) are critical, as illustrated in Fig. 1.5. Two of the four species of bases in DNA, adenine and thymine (A and T), automatically Noncovalent (hydrogen) bonds are critical to holding the two chains together C Phosphate-sugar backbone G G C A T T A C G Bases attached to sugar residues of nucleotides Fig. 1.5 Diagram of the structure of double-stranded DNA. The backbone consists of alternating sugar-phosphate residues to which the four types of base are attached. The base pairs are always between G and C or between A and T. Note that each base pair always includes one larger and one smaller base so that all base pairs are of the same size. The two chains are held together by noncovalent bonds (page 37); there are three between G and C and two between A and T. The two strands are shown as being parallel for clarity but in fact they form a double helix as shown in the model in Fig. 1.6. Daph-01.qxd 29/10/04 9:03 PM Page 8 8 chapter 1: the basic molecular themes of life Parent DNA double helix A T A Incoming monomers C T G G base pair G A C C with parent strands A Strands separate C T C C T G G G T G A A C C C A G A T C T G A G G C C A T C T G A G G C C G Monomers polymerised into DNA A C T G G A C T G G T G A C C T G A C C Two identical double helices Fig. 1.6 A model of B DNA. Space-filling atomic model of a DNA segment with one major groove and two minor grooves. pair together because their shapes are complementary and hydrogen bonds form between them. The same is true for the other pair, guanine and cytosine (G and C). This pairing is known as complementarity or Watson–Crick base pairing, after its discoverers. It is specific; base pairing in this way occurs only between G and C, and A and T respectively. The two strands in a DNA molecule are not parallel as indicated in Fig. 1.5 for simplicity, but rather wind around each other to form the well-known double helix, shown more realistically in the space-filling model of Fig. 1.6. DNA can direct its own replication The central requirement of any genetic system is that the hereditary information can be replicated and passed on to daughter cells and the reason nucleic acids carry the genetic information is that they have the capacity to direct their own replication as well as performing their function of directing protein synthesis. If the two strands of DNA are separated, Fig. 1.7 Principle of DNA replication. The two strands of the double helix are held together by hydrogen bonds between bases A and T, and G and C respectively. When the strands are separated the single strands are now available for base pairing by incoming monomer nucleotides. The nucleotides thus lined up are linked together to give two identical daughter double helices. Exactly the same self-replicating principle applies to RNA replication and to the transcription (copying) of DNA into RNA which occurs in the production of messenger RNA from a gene. Note that each of the replicated double helices contains one parental strand and one newly synthesized one. Note also that the actual mechanism occurs with the incoming nucleotides pairing up in the active site of the DNA polymerase enzyme attached to the template; for simplicity the DNA polymerase is not shown here. the base-pairing potentiality is exposed. The enzyme which synthesizes DNA moves along each strand, linking together nucleotides in the sequence specified by the strand being copied (known as a template strand). An A on the template strand matches a T on the new strand, G is matched to a C and vice versa. Figure 1.7 illustrates the principle of this, but note that for illustrative purposes the incoming monomers are shown lined up while in reality the enzyme is involved in their correct placing also (Chapter 23). Since both strands are read in this way, we end up with two new double helices identical to the original one. This style of replication of DNA is known as semi-conservative; each new double helix contains an old strand from the parent DNA molecule and one newly synthesized one. The linking together of the monomers requires energy; this is supplied indirectly from ATP by the mechanism discussed in Chapter 23. Daph-01.qxd 29/10/04 9:03 PM Page 9 chapter 1: the basic molecular themes of life 9 What is the nature of the genetic information in DNA? It is in the form of the coded sequence for amino acids in proteins. A triplet of three bases on a DNA strand specifies each amino acid in a protein. With four different bases, 64 different triplet combinations are possible (4 4 4). Since only 20 species of amino acids are involved, there is plenty of coding ability for protein synthesis and room for full stops to signify the end of the message. To code for large numbers of proteins, each of which may have hundreds of amino acids, the DNA has to be of prodigious length. Each human cell (105 m in diameter) has about 2 m of DNA (you might like to work out the total length of DNA in your body given that there are about 1013 cells). It is a very narrow thread and is greatly compacted in the nucleus. The DNA of the human genome (the complete collection of chromosomes) contains 3.2 billion nucleotide pairs. The complete sequencing of these has been achieved by the Human Genome Project, although understanding of the organization of the genome and of much of its DNA is incomplete. Genes are part of the continuous chromosomal DNA molecule. Each gene is distinct from the next, separated by spacer sequences between the genes of no known function. Proteins are synthesized on cellular structures known as ribosomes. These take instructions (indirectly) from the gene. To instruct the ribosomes on the amino acid sequence of a specific protein, each gene is independently copied into a different nucleic acid called messenger RNA (mRNA). It delivers the message of coded instructions from the gene to the ribosome. RNA has almost the same structure as a single strand of DNA with small chemical differences in the monomers. T is replaced by U, for uracil, which lacks a methyl group present in T (the details are not important at this stage), and the sugar is ribose with an extra oxygen atom, rather than deoxyribose. The bases in RNA have the same base-pairing properties as those in DNA; in RNA, U pairs with A. Only one of the two strands of DNA in a gene is copied into RNA. The sequence of information flow is as shown below. ( RNA monomers) ( Amino acids) ↓ ↓ DNA of gene 1 → m RNA 1 → polypeptide 1 → folded protein 1 DNA of gene 2 → m RNA 2 → polypeptide 2 → folded protein 2 How can a linear one-dimensional sequence of information in genes give rise to the three-dimensional structure of a protein in a living organism? This is where the folding of the linear polypeptide comes in. An unfolded polypeptide is, with rare exceptions, not biologically functional. In the cell, when a ribosome synthesizes a polypeptide, it folds up into the correct configuration in a few minutes. Proteins are complex three-dimensional structures formed by folding into the correct configuration, as specified by the amino acid sequence in the polypeptide. To a considerable extent, the folding is determined by hydrophobic (water-hating) amino acids being placed mainly on the inside of the molecule away from water, and the hydrophilic (water-loving) ones on the surface. Only after folding into their three-dimensional configurations do they perform their roles in living organisms. (Have another look at the folded haemoglobin molecule in Fig. 1.4.) This is the basis of how the one-dimensional linear information present in DNA specifies the formation of threedimensional organisms, since the folded proteins can assemble into larger living structures; yet another of the profound concepts of life. Junk DNA What we have outlined above has been the accepted dogma for about half a century. Genes are transcribed into mRNAs which code for proteins and proteins determine the heritable characteristics of organisms. None of this is factually challenged but very recently it has become evident that our concept of genetic inheritance is not the full story. There were a few facts, which while not contradicting the accepted gene concepts, seemed a little odd. First it had become apparent with the completion of the human genome project that biological complexity is not proportional to gene numbers. The rice plant has more genes than does a human. A nematode worm has 18 000 genes and the more sophisticated fruit fly only 13 000. Another oddity is that, in a human, the DNA sequences that actually code for protein sequences amount to only 1.5% of the total. The rest was dismissed as junk DNA without any informational content— much of it useless garbage collected by eukaryotes during evolution and which for some reason could not be got rid of. (Prokaryotes have little or no junk DNA.) There is now evidence that in fact junk DNA contains large numbers of noncoding microgenes which have been conserved over long periods of evolution. They code for tiny micro RNAs which are not for protein-coding purposes. What are they for? It is too early for this to be known fully but they appear to be responsible for some of the inherited characteristics of organisms. Possibly in some way they regulate the pattern of expression of the collection of protein-coding genes (the expression of a gene means it giving rise to the synthesis of a protein). This, it is thought, might be partly responsible for the complexity of a human being disproportionate to the number of conventional genes. Elimination of a microgene has been shown to cause dramatic changes in the structure of Daph-01.qxd 29/10/04 9:03 PM Page 10 10 chapter 1: the basic molecular themes of life a plant. We discuss the human genome in Chapter 22 and on page 353 one mechanism by which certain small RNA molecules can interfere with gene expression. The further reading list at the end of this chapter gives some short reviews on this development. This is yet another of the ‘hot’ research areas of molecular biology. Molecular recognition by proteins We have described the role of proteins as catalytic molecules or enzymes. Proteins have the ability to bind to other molecules, which often are also proteins, in a completely specific manner. They ‘recognize’ the molecule they are ‘designed’ by evolution to bind to. Life is completely dependent on this. Protein molecules associate to form complexes ranging in size from dimers to molecular complexes containing large numbers of protein subunits forming larger cellular structures. However, specific protein interactions go far beyond this. Hormones and growth factors deliver signals to cells by combining with specific protein molecules known as receptors displayed on the outside of cells. The cell-signalling system of the body (Chapter 27) depends on it. Enzymes recognize their substrate molecule(s); gene regulation depends on control molecules recognizing a sequence of a few nucleotides among billions on a chromosome to give but two examples. Life is dependent on specific protein attachments to other molecules. There is another requirement for molecular recognition. The attachments must often be easily reversible. An enzyme must release the products of the reaction it catalyses; genecontrol proteins must detach when it is appropriate to do so, for without this, the activation of a gene would be irreversible, whereas many genes need to be switched on and off. How is this molecular-recognition system achieved? The answer is the way several weak chemical bonds between matching surfaces add up to a sufficiently strong but reversible attachment of the molecules. Noncovalent or weak chemical bonds Noncovalent bonds (page 37) are electrostatic attractions between positively and negatively charged atoms, and are much weaker than covalent ones. The strongest noncovalent bonds are ionic bonds such as between ions, the next strongest are hydrogen bonds dependent on partial atomic charges, and the weakest are van der Waals forces, which may be between any two atoms appropriately positioned (Chapter 3). A single noncovalent bond would be insufficient to hold two molecules together; a group of them is needed. Atoms need to be sufficiently close for them to form in sufficient numbers. The protein and its ligand (as the binding molecule is called) must therefore be complementary in shape and with chemical structures suitable for forming noncovalent bonds between molecules at the specific patches on the protein surface in contact with the ligand (the entire protein surface is not usually involved in associations). Because proteins have unique structures, individual proteins can evolve to be specific for recognizing particular molecules with which to bind. This is the basis of biological specificity. It is difficult to think of a living process that does not require structural complementarity between specific proteins and other molecules. Noncovalent bonds form spontaneously without the need for enzyme catalysis. They also are broken easily, which gives them the required degree of reversibility referred to earlier. If there are large numbers of weak bonds then molecules can be bound together almost irreversibly, such as is the case in antibody–antigen reactions (page 511). The requirement for easy reversibility is clear in the replication of DNA when the two strands have to be separated to expose the basepairing potentiality of the bases. There are cellular mechanisms for breaking of noncovalent associations in DNA as required. In addition to the protein molecular recognition we have described, weak bonds play an important part in the folded structure of proteins. Protein molecules are better regarded as molecular machines which need to be flexible in their configuration rather than as rigid unchanging structures. The use of weak bonds in their three-dimensional structures confers this flexibility. How did it all start? Living organisms consist of one or more cells (Chapter 2). Each cell is surrounded by a cell membrane, a thin sheet composed mainly of lipid (fatty) molecules which is necessary to hold the contents of the cells together. The origin of the first cell is necessarily speculative but at some time in the establishment of life there must have been a primordial self-replicating molecular system from which living cells developed. Hypotheses have been formulated of how selfreplicating systems might have been established on a mineral Daph-01.qxd 29/10/04 9:03 PM Page 11 chapter 1: the basic molecular themes of life 11 Polar (hydrophilic) head groups Aqueous interior Hydrocarbon (hydrophobic) layer Polar or hydrophilic head group Nonpolar or hydrophobic tails Fig. 1.9 An amphipathic molecule of the type found in cell membranes. surface or in a drop of liquid or sea pool, but, at an early stage, it had to be contained by a membrane. Otherwise it presumably would have been dispersed. A striking fact is that when molecules of a suitable substance are agitated in water they form small spherical vesicles (liposomes). (The type of lipids found in egg yolk are examples. Their structures are given in Chapter 7 on membranes.) The boundary of these vesicles is made of a structure known as a lipid bilayer, which is virtually identical to the basic structure found in the membranes of modern cells (Fig. 1.8). Such vesicles may have enclosed a drop of the first self-replicating system. From such a primordial celllike structure all life is postulated to have originated. The requirements for a molecule to be capable of forming a lipid bilayer are not demanding; it needs to have amphoteric properties, by which we mean one part of a molecule is water-insoluble (hydrophobic) and the other water-soluble (hydrophilic), and of a roughly suitable shape as illustrated in Fig. 1.9. What was the source of the molecular building blocks needed to produce the components of living cells? Experiments have been done in which electrical discharges were passed through Fig. 1.8 A synthetic liposome made of a lipid bilayer structure. a mixture of gases (hydrogen, methane, ammonia, and CO2, in the presence of water) intended to resemble the atmosphere of the primitive Earth. A mixture of potential precursors of biomolecules including some amino acids was produced. The postulated primordial self-replicating cell must have taken in molecules from the environment to produce new cellular material. Diffusion through the containing membrane before the development of transport mechanisms would have been slow, and replication likewise slow, but vast time scales were involved. The RNA world A more difficult problem in the establishment of a self-replicating system is to identify the initial catalysts and the primitive ‘genetic system’ to ensure faithful replication. In short, a chicken-and-egg problem; which came first, proteins to catalyse reactions or nucleic acids to direct the synthesis of primitive proteins? This dilemma received a possible answer with the discovery that RNA can catalyse some chemical reactions including conversion of short polynucleotides into longer sequences. Such molecules were given the name of ‘ribozymes’ (not to be confused with ribosomes). It was the first time that biological molecules other than proteins had been found to catalyse specific reactions. RNA has the same potentiality for acting as a template in its own replication as explained for DNA. In short RNA may have been both the catalyst and the primitive ‘genetic system’ for self-replication in the origin of life, thus avoiding the chicken-and-egg dilemma. It may be speculated that the first short polynucleotides were formed from nucleotide monomers by heat chemically condensing the nucleotides together by driving off water. Daph-01.qxd 29/10/04 9:03 PM Page 12 12 chapter 1: the basic molecular themes of life From this stage, evolution of more efficient catalysts, namely proteins, to replace RNA catalysts is postulated to have occurred, though the first ‘proteins’ must have been primitive and presumably were short peptides of low catalytic efficiency. The concept of an RNA-based biological world that preceded the DNA world is generally accepted for there is much supporting evidence. In modern cells, although protein enzymes bring about almost all catalysed reactions, the displacement of RNA from this role is not complete. What might be regarded as a few fossil catalysts—hangovers from the RNA world—exist in cells as ribozymes. One of these in ribosomes is involved in the synthesis of all proteins, providing an interesting link between one type of catalytic system (RNA) and a more efficient one (proteins). Ribosomes are giving us a glimpse into the ancient RNA world, somewhat akin to astronomers viewing the past universe through long-distance telescopes. Why has DNA superseded RNA as the medium for storing genetic information in all cells? The answer almost certainly is that DNA is chemically more stable than RNA. If a mistake is made in the synthesis of a DNA molecule, or it is damaged in some way, enzymes exist to repair it (Chapter 23). RNA is still the genetic material of many viruses. RNA damage is not repaired (as occurs with DNA) and RNA viruses therefore mutate rapidly. By constantly changing the proteins which the immune system recognizes (Chapter 29), new viral strains escape immune attack. So primitive molecular instability, coupled with lack of repair, is an advantage even in the modern world where most viruses are in fact RNA ones: human immunodeficiency virus (HIV), influenza, poliomyelitis, mumps, foot and mouth, measles, and rubella viruses to name a familiar few. The same applies to plant viruses. The new ‘omics’ phase of biochemistry and molecular biology From what has been said in this chapter, it will be clear that sequences of amino acids in proteins and those of the nucleotides in DNA underlie just about everything in life. As these sequences were determined, it was realized that the flood of molecular information would be of little avail without an efficient retrieval system. To this end, in a remarkable example of international collaboration, protein and DNA computer databases were established in various centres around the world in which information on proteins and genes is recorded. Details of the sequences of thousands of genes and proteins together with the threedimensional structures of many of the latter are now available. Software in the public domain is available to search the databases and analyse the information contained in them. This area of science is known as bioinformatics, which has become of immense importance in biochemistry and molecular biology. Parallel to this there have been developments of methods for the automatic sequencing of DNA that have resulted in the completion of the human genome project, which has determined the nucleotide sequence of the entire human DNA (known collectively as a genome). The sequencing of the genomes of other species such as those of the mouse, the rice plant, and Drosophila, the fruit fly, are also complete to cite only a few. Another important method, ‘DNA chip’ or DNA microarrays, allows the simultaneous study of the transcription (copying) of large numbers of genes by detecting which are actively giving rise to mRNA (Chapter 28). In the protein field, the relatively recent application of mass spectrometry (Chapter 5) to proteins (a development of immense importance) makes it feasible to investigate many proteins at once. These developments are sometimes referred to informally as the ‘omics revolution’, which needs an explanation. The entire collection of proteins in a cell (in any one state, for it varies from time to time) is called the proteome and that of genes, the genome. The collective studies of these are called proteomics and genomics respectively. An apt analogy has been put forward to illustrate the meaning of these collective terms to the effect that many of the instruments in an orchestra (genes, proteins, and metabolites) have been identified. The next stage is to listen to the music the orchestra plays with them. In other words, the proteins and genes in a cell function as a collective whole and a full understanding of life and abnormalities will need to consider them as such and to understand their interactions. The collective study of the copying of genes into mRNA (transcription) using DNA microarrays is now sometimes referred to as ‘transcriptomics’, and the term ‘metabolomics’ is used to describe the complement of low-molecular-mass molecules (metabolites) present in specific cells and at specific times. These ‘omics’ studies make it possible to ask what genes are active and what proteins and products are present, say, in cancer cells as compared with neighbouring normal cells. The medical potential is very great and applies with equal force to plant studies and its potential in agriculture. The great potential for medical intervention in treatment of diseases based on a molecular understanding of life has given rise to the biotechnology boom. Daph-01.qxd 29/10/04 9:03 PM Page 13 chapter 1: the basic molecular themes of life 13 ■ SUMMARY Unity of life. Despite the diversity of life forms, at the molecular level all life is basically the same, variations being modifications of the same theme. It suggests a single origin of life. Living cells obey the laws of physics and chemistry. Energy is derived from breaking down food molecules (ultimately produced by plants using sunlight energy). The energy must be released in a form which can drive chemical and other work. Heat cannot do work in the cell. ATP (adenosine triphosphate) is the universal energy currency in life. The energy is used to synthesize ATP from ADP (adenosine diphosphate) and phosphate; ATP breakdown is coupled to biochemical work. Molecules found in living cells. These include small molecules such as water, food molecules, and their breakdown products. Macromolecules, among which proteins and DNA are preeminent, are large molecules formed by polymerization of smaller units. Proteins. These are the cell’s workhorses and the basis of most living structures. They are long chains of amino acids, typically hundreds long, but folded up into a precise three-dimensional structure. There are 20 different amino acids in proteins; each protein is a unique sequence of these. Enzyme catalysis. Enzymes are proteins which catalyse virtually all the thousands of chemical reactions of life. One enzyme, one reaction. Relatively recently, however, it has been discovered that RNA (ribonucleic acid) can have catalytic activity. DNA (deoxyribonucleic acid). The cell must have a blueprint of the sequence of each of the thousands of proteins it synthesizes. This is the function of DNA in the form of genes, each gene specifying the amino sequence of one polypeptide. DNA consists of two strands of polynucleotides in a double helix. A nucleotide has the structure base—sugar—phosphate. The bases are paired by hydrogen bonds, the base A linked to T, and G to C. This automatic pairing is the basis of self-directed replication. The base sequences act as a code, specifying individual amino acids; three bases, known as a codon, representing an amino acid. The genetic code is the table correlating codons to the amino acids that they specify. Ribosomes translate the base sequences of genes into proteins. The mechanism of this is that each gene is copied into messenger RNA (a polymer resembling DNA) which attaches to and instructs a ribosome. Ribosomes have no specificity for the proteins they synthesize; they produce the protein specified by the messenger just as a tape player plays whatever music is specified by the tape. Evolution of genes and proteins. DNA is the record of sequences needed to synthesize proteins, the information having been acquired by billions of years of evolution. Mistakes in replicating DNA inevitably occur. These are mutations which result in faulty amino acid sequences in proteins, which may in turn result in genetic diseases. The random mutations are also the material on which evolution, via natural selection, develops new genes. Molecular recognition by proteins. Apart from recognition of substrates by enzymes, proteins recognize (bind to) other molecules such as hormones and growth factors, thus directing development, growth, and metabolic processes. The binding is by multiple weak bonds whose formation depends on atoms being close enough for the bonds to form. This means that only molecules closely complementary to one another bind. It is the basis of biological specificity. The use of weak bonds in molecular recognition confers flexibility and reversibility. How did it all start? An RNA world is believed to have preceded DNA and proteins. Life presumably must have originated by the spontaneous formation of a molecule capable of self-replication without the aid of proteins. It is generally believed that life originated with RNA which has the information to direct its own replication. The RNA world is still seen in the genes of some viruses, and in all cells in the form of ribosomes which have a high content of RNA. DNA replaced RNA because it is a chemically more stable repository of genetic information. The new ‘omics’ phase of biochemistry and molecular biology. In the past decade an explosion of new technologies has revolutionized biochemistry and molecular biology. Prominent among these are automated DNA sequencing, mass spectrometry for the study of proteins, and DNA microarrays for gene studies. They are having enormous effects on biological science, medicine, and agriculture. The branches of science utilizing these are described as proteomics, genomics, and metabolomics, which are collective terms to specify that large numbers of proteins, genes, and metabolites, respectively, can be examined together. Daph-01.qxd 29/10/04 9:03 PM Page 14 14 chapter 1: the basic molecular themes of life ■ FURTHER READING Cech, T. R. (1986). A model for the RNA-catalysed replication of RNA. Proc. Natl. Acad. Sci. U.S.A., 83, 4360–3. Describes the formation of polycytidylate. Gilbert, W. (1986). The RNA world. Nature, 319, 618. Joyce, G. F. (1989). RNA evolution and the origins of life. Nature, 338, 217–24. Orgel, L. E. (1994). The origin of life on earth. Sci. Am., 271(4), 52–61. Growing evidence supports the idea that the emergence of catalytic RNA was a crucial early step. Lafcano, A. and Miller, S. L. (1996). The origin and early evolution of life: prebiotic chemistry, the pre-RNA world and time. Cell, 85, 793–8. Junk DNA and microRNA genes Gibbs, W. W. (2003). Hidden genes. Sci. Am., 289, 28–33. Mattick, J. S. (2003). Challenging the dogma: the hidden layer of nonprotein-coding RNAs in complex organisms. BioEssays, 25, 930–9.