Protein interaction world – an alternative hypothesis about the origins of life Peter Andras, PhD1 and Csaba Andras, MSc2 1 School of Computing Science, University of Newcastle, Newcastle upon Tyne, NE1 7RU, UK 2 Department of Chemical Engineering, Budapest University of Technology and Economics, Budapest, Hungary Correspondence: Peter Andras School of Computing Science University of Newcastle Newcastle upon Tyne NE1 7RU United Kingdom tel: +44-191-2227946 fax: +44-191-2228232 e-mail: peter.andras@ncl.ac.uk 1 ABSTRACT The protein interaction world hypothesis about the origins of life is introduced in this paper. According to this hypothesis, life emerged as a self reproducing and expanding system of protein interactions, which is conceptualized as an abstract communication system. We describe key components of abstract communication systems and how such systems work, including the role of memories of communications. Protein interaction systems are made of communications that are the interactions between proteins. In the context of the protein interaction world hypothesis RNA molecules serve as memories of protein interactions and DNA molecules are memories of RNA interactions. The protein interaction world hypothesis is based on plausible prebiotic processes and offers a systematic view on how life emerged and evolved towards current cellular life forms. We compare the protein interaction world hypothesis with the most commonly accepted RNA world hypothesis about the origins of life. We conclude that the protein interaction world hypothesis is more plausible than the RNA word hypothesis. We discuss the role of rare nucleic bases in the context of the protein interaction world, and we show that their role can be explained more parsimoniously in this context than in the context of the RNA world. We also discuss the replication in the context of the two theories, and we highlight that while in the case of the RNA world the replication refers to material replication of some molecules, in the context of the protein interaction world hypothesis the replication happens at the level of replication of interactions between proteins. 2 1. INTRODUCTION To a good extent the origins of life constitute still a mystery (1, 2, 3, 4). Since the 1950s there were several experiments aimed to elucidate how life emerged on the Earth (5). During the past 50 years several meteorite remnants have been analysed to find traces of life related molecules (6, 7, 8). These works led to the conclusion that many organic molecules could emerge in abiotic conditions, including amino acids, lipids, aromatic compounds and possibly simple sugars (5), and even small globular vesicles may form in such conditions (9). At the same time these experiments provided no indication of how self-replicating structures like cells, ribonucleic acid (RNA) and deoxyribonucleic acid (DNA) molecules could emerge. Currently the most widely accepted hypothesis about the origins of life is based on the assumption that RNA molecules emerged in abiotic conditions (10, 11, 12, 13). Theoretical investigations and experimental evidence indicate that the building blocks of nucleotides (i.e., nucleic bases and sugars) were synthesised in the prebiotic environment (10, 14, 15, 16, 17, 18, 19). It is supposed that these molecules constituted nucleotides, which formed heteropolymers in the form of RNA molecules. Such RNA molecules could catalyze organic chemical reactions and replicate themselves forming the basis of self-replicating life. Of course there are several unanswered questions in the context of the RNA world hypothesis, like how did purine (adenine and guanine) and pyrimidine (cytosine, thymine and uracil) bases combine with sugars (ribose) to form nucleotides in abiotic conditions (20), or how did the replication process achieve a high enough precision in order to allow stable evolutionary selection of some RNAs (21). 3 The other major alternative hypothesis is the protein world hypothesis (22). This supposes that proteins and peptides were the first molecules that started life and RNA molecules emerged later to support the replication of protein molecules. The fundamental problem of this hypothesis is that proteins are unable to replicate themselves, which questions the beginning of the protein world based on selfreplication of proteins. Variants of the protein world hypothesis suggest that the protein world may have started as a world of thioesters (23), or that proteins could have co-evolved together with RNA molecules (22). Systems theory offers powerful analysis methods to untangle complex problems (24, 25). The fundamental assumption of systems theory analysis is that the analysed phenomenon can be conceptualised as an abstract communication system. The phenomenon itself consists of communications in this communication system. Analysing such communication systems allows defining components of the complex system and functions of these components. The components are characterized by a set of constraints on communications. Earlier versions of systems theory have been applied to problems of life sciences (26), but these applications lacked the conceptual clarity brought by new advances in systems theory of abstract communication systems. Here we will use concepts of systems theory to analyse cells and early life forms, and to formulate an alternative hypothesis about the origins of life. We follow the protein world hypothesis and propose that the prebiotic world was made by a mixture of small organic molecules (e.g., short-chain fatty acids, amino acids) that produced relatively short peptides (e.g., the products of protenoids (27) and 4 thioesters (23), or peptides generate using carbonyl sulphide in aqueous medium (28)). This prebiotic world led to the emergence of proteins (i.e., polypeptides; more specifically those peptides which are made of 20 biotic amino acids which are produced in living cells) by using smaller peptides to catalyse reactions between further peptides and other molecules. We propose that the emerging life constituted as a protein interaction world. We see this world as an abstract communication system constituted by communications in the form of interactions between proteins. The replication of the system consists of the replication of these interactions. According to our hypothesis the protein interaction world led to the emergence of encapsulated reproduction of sequences of protein interactions in form of protocells. Such protocells turned into advanced protocells by developing memories of protein interactions in form of RNA molecules. This was followed by evolving into proper cells having complex intracellular processes and long term memory in the form of DNA. In this paper we describe how systems theory provides the conceptual foundations for our hypothesis and we argue that the protein interaction world hypothesis may be more plausible than the RNA world hypothesis. Our proposal shows similarities with the earlier proposal of Lacey et al. (22), who suggested the co-evolution of proteins and RNA molecules, including the suggestion that RNA molecules may have served as the memories of proteins. Similar ideas can be found also in the work of Kauffman (29), who suggested that life may have emerged as a system of biochemical interactions, which reached a critical level of compound variability and interaction density. To some extent similarly to our ideas, Segre and Lancet (3) have also suggested that life may have emerged as a system of molecular interactions that reproduced itself. 5 The rest of the paper is structured as follows: Section 2 describes briefly the RNA world hypothesis; in Section 3 we provide a review of key systems theory concepts; in Section 4 we describe in detail the protein interaction world hypothesis; Section 5 contains discussion of implications of the protein interaction world hypothesis; the paper is closed by the conclusions in Section 6. 6 2. RNA WORLD RNA or ribonucleic acid molecules are sequences of nucleotides, containing typically four types of bases: adenine, guanine, cytosine and uracil. The identity of an RNA molecule is determined by the sequence of the nucleotides. An RNA molecule may contain a few tens or even thousands of nucleotides. The role of RNA molecules in cells is to drive the building process of proteins that takes place at the ribosomes (2, 30). The RNA molecules are built within the cell nucleus by copying segments of the DNA. The primary RNA molecules go through a maturation process, when parts of them are cut out or possibly changed. It is a widely accepted hypothesis that RNA molecules were present in the prebiotic environment on the Earth (e.g., 11, 20). RNA molecules can act as information storage molecules due to their specificity in terms of interactions with other molecules (31), i.e., they store the information about which molecules they should interact with, for example by building them in the case of mRNAs. The RNA world hypothesis is built on the assumption that information (i.e., the sequence of nucleotides, which is different from random) was stored in RNA in the prebiotic environment and this information was replicated by copying of RNA molecules. A cornerstone of the RNA world hypothesis is that RNA molecules can catalyze biosynthesis processes (30). By catalysing interactions between other organic molecules (e.g., proteins) the RNA molecules can facilitate the molecular interaction mechanisms needed for the synthesis and replication of the RNA (12). Consequently 7 the RNA molecules can organize autocatalytic processes required for selfreproductive biochemical systems (1, 32). The replication of RNA molecules can be more efficient if the ingredients of the RNA catalyzed self-reproductive system are enclosed, which may have led to the emergence of protocells (9). Such protocells may have contained a mixture of proteins which were used during the RNA replication process. The DNA may have evolved as a long term information storage molecule, which being more stable than the RNA, could maintain the information needed for the RNA replication over periods of disturbance. The evolution of the replication mechanism of information encoded in RNA molecules led to the evolution of cells with complex intracellular mechanisms, involving a large number of proteins and DNA used as the cell’s long term information storage device. Theoretical studies (12) and existing experimental evidence (8, 15) indicate that the components of RNA molecules, sugars, purines and pyrimidines can be synthesised in abiotic conditions, although in relatively small quantities (8, 33). There are also suggestions about how the synthesis of RNA molecules from mono-nucleotides (i.e., combinations of purines / pyrimidines and sugars) may have emerged on catalysing clay surfaces (34, 35). Still, a fundamental problem with the RNA world hypothesis is that it does not explain how nucleotides, made of purines, pyrimidines and sugars, formed before the emergence of RNA molecules (20). While synthesis of nucleotides and of the RNA happens in biotic conditions, robust abiotic counterparts of such processes are not known (23). This negative result questions the fundament of the RNA world hypothesis. 8 Supposing that RNA molecules and their constituent nucleotides are available in the prebiotic environment we still face further important problems. RNA molecules are not very stable and except in relatively cold environment they decompose into their constituents preventing faithful replication of themselves (36, 37). Theoretical estimations have shown that the replication of information molecules should be very precise to allow stable evolution (2, 38) (i.e., otherwise replication noise would turn back the population of replicating molecules to a random mixture of them). Considering the instability of RNA molecules and the actual frequency of replication errors it seems unlikely that simple RNA replication mechanisms could have reproduced RNA molecules with the required accuracy for stable evolution (2). Some authors suggested alternative versions of the RNA world hypothesis, including the start with the DNA world (20) or the initial use of peptide nucleic acids as information coding molecules (39). These alternatives do not really solve the fundamental problems of the RNA world; just replace them by comparable problems associated with the alternative proposal. Summarizing, the RNA world hypothesis and its alternative versions offer a good explanation of how life may have emerged supposing the existence of RNA molecules and sufficiently high fidelity and stable RNA replication mechanisms. The Achilles heel of the hypothesis is that the existence of prebiotic RNA molecules and high fidelity RNA replication mechanisms is questionable and so far it was not possible to be proven. 9 3. SYSTEMS THEORY The origins of systems theory go back to the 1940s, when cybernetics research (40) programs started to investigate the behaviour of complex engineering systems. Starting from 1950s the general systems theory was developed following the work of von Bertalanffy (41). The mathematical theory of complex systems emerged in the 1960s and focuses on systems that can be described by sets of differential equations and analyses the properties of these equations (e.g., 42). In the period between the 1960s – 1980s Maturana and Varela developed the theory of autopoetic systems (26) which aims to explain how self-regenerating and self-replicating living systems emerge and evolve. More recently Luhmann (25) introduced a new approach to systems theory following to some extent the works of Maturana and Varela. Luhmann’s work concentrates on abstract communication systems made of communications, ignoring the communication units that generate these communications. The theory of abstract communication systems gives a fresh look at the complex systems, different from the classical approaches of 1940s – 1960s and also from the approach of mathematical complex systems theory. This new approach offers powerful analysis tools that allow to identify systems and their components, and to analyse the function of these components in the context of the system. We follow the work of Luhmann in this paper. Communication units produce symbols that are transmitted to other communication units, which perceive them. Communications are sequences of such symbols. Abstract communication systems are made of such communications (see Figure 1A for illustration). By definition, the communication units are not part of the system. 10 Communications reference other communications in the sense that the sequence of symbols contained in a communication is dependent on the contents of other earlier or simultaneous communications. A dense cluster of inter-referencing communications surrounded by rare network of communications constitutes a communication system (see Figure 1B for illustration). 11 Figure 1. A) The concept of communication. B) The concept of communication system. Squares represent communication units, continuous arrows represent communications, and segmented arrows represent referencing relations between communications. The communication system is a dense cluster of inter-referencing 12 communications (area in the middle), surrounded by a rare network of other communications. The communication units are not part of the communication system. A communication system is defined by the regularities that define how referenced communications determine the content of referencing communication. All communications that follow the set of rules defining the system are part of the system. All other communications that do not follow the rules of the system are part of the system’s environment. Communication systems reproduce themselves by recruiting new communications between communication units. A communication is recruited to a system if it follows the referencing rules of the system. How successful is the recruitment of new communications depends on earlier communications generated by the system and on the environment of the system. We can view the system as a selfdescribing system made of communications. At the same time the system describes its environment in a complementary sense. Better descriptions of the system’s environment lead to higher success in recruiting new communications. Systems that reproduce and expand faster than other systems may drive to extinction the slower reproducing and expanding systems. The limits of system expansion are determined by the probabilistic nature of referencing rules. A communication may reference several earlier communications indirectly through other referenced communications constituting referencing sequences of communications. The probabilities of referencing rules determine how long can be such referencing sequences of communications before the last communications becomes a random continuation. Longer referencing sequences of communications (i.e., more detailed descriptions) allow better descriptions of the systems and its environment. The 13 optimal size of the system (i.e., the number of simultaneous communications being part of the system) is also determined by the probabilistic indeterminacies of referencing rules. Systems that overgrow their optimal size may split into two similar systems with the same (or possibly similar) set of referencing rules. Communication systems may develop subsystems that are systems within the system, i.e., they constitute a denser inter-referencing cluster within the dense communication cluster of the system. Communications that are part of subsystems follow system rules with additional constraints that are characteristic of the subsystem. More constrained referencing rules decrease indeterminacies and allow the system to generate better complementary descriptions of the environment and expand itself faster than systems without subsystems. Systems may also change by simplification of the set of their communication symbols (i.e., reduction of the number of such symbols). This may lead to reduction of probabilistic indeterminacies in the referencing rules. Consequently systems with simpler sets of communication symbols may expand faster than systems with larger sets of communication symbols. Another way of extending reliable descriptions of the environment (i.e., non-random sequences of referencing communications) is by retaining records of earlier communications, i.e., by having memories of earlier communications that can be referenced by later communications. We can view such memories as the use of additional communication units that reproduce for a certain period a certain communication. Having memories reduces the indeterminacies in referencing by allowing direct referencing of much earlier communications, instead of referencing 14 them through a chain of references. Systems with memory can expand faster than systems without memory. Systems with memory may develop a memory or information subsystem (i.e., the memory is information about the past of the system) consisting of communications between communication units generating memory communications. If such communications constitute a dense cluster of inter-referencing communications determined by a set of characteristic referencing rules the information subsystem of the system emerges. Having an information subsystem allows combination of memories and by this the generation of descriptions of the environment which are better than environment descriptions in systems with memory but without information subsystem. Systems compete with each other for communications. Systems which have better complementary descriptions of the environment can generate communications that fit better their environment and make easier the recruitment of new communications. Systems with better environment descriptions out compete systems with less good descriptions of the environment. Systems having subsystems, simple communication symbol set, memory and information subsystem can generate better descriptions of their environment than systems which lack any of these features. 15 4. PROTEIN INTERACTION WORLD Experiments simulating prebiotic Earth environment (5) and the analysis of larger meteorite remains indicate that organic molecules like amino acids, short-chain fatty acids and others can form without requiring the preceding existence of life. These molecules may form simple autocatalytic interaction systems (1, 43) and small vesicles delimited by lipidic or amphiphilic membranes (5). Experiments have shown that amino acids form tight clusters called proteinoids (22, 27) at high temperatures, which may lead to the formation of simpler peptides (i.e., short chains of amino acids or oligopeptides) (44). Another way of forming peptides is by the transformation of thioesters (23, 45), a chemical pathway that works efficiently in abiotic conditions and is also used in biological organisms (23). Experimental simulation of marine hydrothermal vents has shown that amino acids may form short peptides in such conditions (44). Recently, Leman et al. (28) have shown that peptides may form with high yield in plausible volcanic marine environments in the presence of carbonyl sulphide, a common volcanic gas. Most genetic analysis evidence suggest that early life emerged in high temperature environment rich in sulphur (46), which implies the plausibility of the above mentioned ways to the synthesis of early peptides. The interactions between peptides may catalyse the formation of long chain peptides (proto-proteins), long-chain linear fatty acids, lipids, and other organic molecules. The analysis of common and evolutionarily preserved parts of many genomes indicate that the most preserved are about 60 genes which are involved in the translation of genetic information into proteins (47). This also suggests that early cells may have developed in protein-rich 16 environments, which provided the building blocks for the development and multiplication of cells. By adopting a systems view perspective, we can see the interactions between amino acids, peptides and other molecules as abstract communication systems. In this system the communication units are the peptides and other molecules, while their interactions constitute the communications, the communication symbols being the constituent phases of such interactions. Reproduction of this system means the reproduction of the interactions between these molecules. The interactions between amino acids, peptides and other molecules depend on earlier interactions between these molecules that prepare the right molecules in the right conformation to perform interactions. Subsets of the possible interactions may form a dense cluster of interdependent interactions, referencing other interactions on the output of which the actual interaction depends. The peptides being the catalysts of most interaction, and the catalytic activity of peptides depending to a good extent on interactions with other peptides, places peptide interactions at the core of dense interdependent interaction clusters. Following considerations from systems theory the peptide interaction systems can reproduce and expand faster if they are in an enclosed environment, reducing the diffusion of molecules that needed to participate in the generation of interactions to maintain and expand the system. This may have led to the emergence of protocells made of isolating lipid membranes encapsulating interacting peptides and other molecules (9). As such systems grow they reach their growth limit and split in similar systems, which continue their growth. 17 Contemporary living cells can be viewed as protein interaction systems in which proteins interact with themselves and with other molecules, change their conformation to prepare for further interactions, and perform phenomenological cellular functions (e.g., cell respiration) by specific sequences of molecular interactions (e.g., metabolic cycles, signalling pathways). In this view cells are self-reproducing and expanding protein interaction systems, which reproduce the interactions between proteins and other molecules according to system specific rules determining the referential (or dependency) sequences of these interactions. The peptide interaction system of protocells describes itself and in complementary sense its environment. Systems theory indicates that a system with memory is likely to out perform systems without memory. In this context memory means long term storage of information about earlier system communications. In the case of peptide interaction systems the memory should represent the interactions between peptides and in complementary sense the environment of protocells. In a very rough approximation the environment of cells is determined by the amount and availability of atomic building blocks of proteins, namely the carbon (C), nitrogen (N), oxygen (O), hydrogen (H) and to some extent sulphur (S), phosphorus (P) and halogen (F, Cl, Br) atoms. Our hypothesis is that the candidate molecules that could serve as memories of peptide interactions and representations of the environment are the sugars, representing C, O, and H content of the environment and also the presence of P in their phosphorylated compounds, and purines and pyrimidines, representing C, N, H 18 content of the environment and also the presence of S and halogens in their corresponding compounds (e.g., 4-thiouridine (48), 5-chlorocytosine (49)). This means that protocells may have used sugars, purines and pyrimidines to store information about interactions between peptides and also in complementary sense about their environment. Proteins and peptides can be seen as the product of interaction between other proteins / peptides containing sub-chains of amino acids corresponding to parts of the amino acid chain of the product protein or peptide (note that proteins are also peptides). To keep a memory of such interactions the system memory of protocells should have been able to record the sequence of amino acids of participating peptides. This may have led to the emergence of proto-RNA, which encoded the sequence of amino acids of interacting peptides by using specific combinations of sugars, purines and pyrimidines to encode amino acids. Theoretical investigations (22) about interactions between peptides and mono-nucleotides support our hypothesis. According to these earlier works chains of amino acids can form double helix chains with mono-nucleotides, such that each amino acid is linked to a triplet of mono-nucleotides. Such complementary chains of peptides could have turned into heteropolymers of nucleotides forming the ancient version of RNA molecules. The requirement that system memories should work as communication units, allowing the referencing of their memory and the reproduction of the communication content of their memory, implies that the memory molecules should be able to interact with peptides and catalyze interactions between peptides. Considering that all peptides are produced from other peptides and possibly by adding some amino acids, the above argument implies that proto-RNA molecules should have existed for all peptides and 19 amino acids. These molecules would have catalyzed interactions between the corresponding peptides and amino acids. Arguments from systems theory suggest that the simplification of the vocabulary of interactions leads to faster expanding systems. This could have been the reason for the elimination of many amino acids from the list of amino acids making peptides participating in the most successful peptide interaction systems, leading to the selection of the 20 protein forming biotic amino acids commonly occurring in living organisms and the corresponding proteins made of these amino acids. Having simplified, more reliable peptide interaction systems led to the increase in length of the peptides leading through several evolutionary steps to giant proteins of currently living cells. Selection factors that led to the selection of some sugar, purine, pyrimidine combinations could have been: (a) the ability of these nucleotides to form long chain heteropolymers, (b) thermal stability of these polymers, and (c) specific enzymatic abilities of proto-RNA molecules. These factors together with simplification – expansion pressures are likely to have led to the selection of combinations of ribose with a few purines and pyrimidines as nucleotide coding units in the proto-RNA, and ultimately to the emergence of present day RNA molecules. The above presented view suggests that protocells were vesicles surrounded by a lipid membrane, reproducing inside series of peptide interactions helped by proto-RNA molecules, which catalyzed these interactions. The building blocks of proto-RNAs were produced by chemical reactions catalyzed by proteins. The proto-RNAs served 20 as memory molecules in the protocells. The competition between protocell systems led to systems with simplified symbol sets (peptides containing selected amino acids and few bases used to build proto-RNA) corresponding to early cells containing proteins made of biotic amino acids and RNA molecules containing mostly the usual bases ribose-adenine (A), ribose-cytosine (C), ribose-guanine (G) and ribose-uracil (U) building blocks. Some RNA molecules contain unusual bases as well, which are combinations of ribose with rare purines and pyrimidines. The interactions between RNA molecules led to the emergence of the information subsystem of the cell, which consists of a dense cluster of interactions between RNA molecules, which depend on earlier interactions between RNA molecules. Some early cells may have developed memory for their information subsystem. This memory would have consisted of molecules that could interact with RNA molecules and would retain the memory of interactions between RNA molecules. The memory of RNA interactions should have encoded the RNAs that were present at the same time and same location participating in interactions between them. Our conjecture is that DNA molecules are memories of RNA interactions, and in early cells they catalyzed the interaction and possibly formation of corresponding RNAs. Being memories of memories the DNA acts in the context of the cell system as long term memory. So, having DNA makes cells more successful in reproducing and expanding themselves than cells without DNA. Early versions of protocells could have contained many versions of combinations of proto-RNAs catalyzing interactions between peptides. The simplification – expansion argument leads to the conjecture that this mechanism evolved towards a simplified 21 version of it, preserving only interactions between proto-RNAs corresponding to proteins and proto-RNAs corresponding to amino acids. In this way the proteins can be built up by condensation of single amino acids with an existing partial chain of the protein, reducing the unreliability of the catalysis of interactions between larger peptides. The emergence and expansion of the RNA communication system implies that there should be many RNA interactions that do not lead directly to generation of proteins, but in functional terms regulate the process of protein generation (see recent results on siRNAs and microRNAs (50, 51)). In a similar way the existence of DNA memories of RNA interactions may have led to the emergence of a system of DNA interactions forming a new subsystem of the cell. The emergence of dense interdependent DNA communications (i.e., interactions between DNA molecules) could have led to the clustering of DNA molecules and the formation of cell nucleus. This also suggests that in cells with nucleus there should be many DNA interactions that do not lead to the production of RNA molecules, but rather regulate this process in functional terms. Summarizing, our hypothesis is that life originated from peptide interaction systems, which reproduced and expanded as vesicles surrounded by lipid bi-layer membranes. Such peptide interaction systems led to the emergence of proto-RNA molecules that served as memories of peptide interactions, facilitating the reproduction and expansion of protocells. Simplification driven expansion led to the selection of biotic amino acids and the reduction of the typical RNA alphabet to the four usual bases (A, C, G, U). Interactions between RNA molecules led to the emergence of the RNA interaction subsystem of the cell and to the emergence of memories of RNA 22 interactions in the form of DNA molecules. The expansion of DNA molecule interactions led to the clustering of DNA molecules and formation of cell nucleus. 23 5. DISCUSSION A. Protein interaction world vs. RNA world Experimental results indicate that the protein interaction world that we described may have originated in an abiotic environment able to produce amino acids, oligopeptides, short-chain fatty acids and other relatively simple organic molecules. The RNA world hypothesis is not supported so far by experimental evidence that would describe ways of abiotic synthesis of nucleotides, the building blocks of RNAs. Consequently the origin requirements of the protein interaction world hypothesis are more plausibly satisfied than the origin requirements of the RNA world. The reproduction of protocells and cells in the context of protein interaction world hypothesis is relatively simple, requiring the reproduction of interactions between peptides / proteins, which can occur in autocatalytic systems of peptide / protein interactions. The reproduction of cells in the context of RNA world requires complicated high precision molecular interaction machinery, which makes questionable the sufficiently high fidelity reproducibility of early RNA world life that would be required for evolution towards modern cellular forms. This shows that early life and evolution according to the protein interaction world hypothesis is simpler to maintain and reproduce than in the context of the RNA world hypothesis. The protein interaction world hypothesis offers a well integrated scenario for the emergence, role and evolution of all macromolecular components of living cells (i.e., DNA, RNA, proteins and other molecules), conceptualizing them in the context of the 24 cell’s internal communication system as communication units and memories, and their interactions as communications. The RNA world hypothesis provides an integrative description of cells, but with several ad-hoc elements (e.g., proteins are side products that turn to be useful as catalysts of RNA replication) and without providing a clear conceptual framework that would explain the evolutionary steps leading to contemporary living cells. This indicates that the protein interaction world hypothesis may have more explanatory and predictive power than the RNA world hypothesis with respect to the interpretation of functional processes characterising living cells and the evolution of cellular life. B. Rare nucleic bases Since the 1960s, researchers found several unusual nucleic bases in RNAs of various micro-organisms. RNA bases like inosine, 1-methyl-guanin, pseudouridin, 4-thiouridine, wybutosine, 5-fluoro-uracil and others are found typically in tRNAs (transport RNA: amino acid specific RNA responsible for the transportation of amino acids to the protein assembly sites) of bacteria and archaea. In many cases the routes of biosynthesis of these unusual RNA bases is already known. The RNA world hypothesis does not provide an easy answer, why such unusual nucleic bases exist and how did they emerge as RNA bases. In the context of the protein interaction world hypothesis we can find a relatively straightforward explanation of the existence and role of unusual nucleic bases. By considering that RNA memories of protein interactions should represent in a crude sense the atomic composition of the environment it follows immediately that micro- 25 organisms living in environments characterised by high sulphur or halogen content should represent these in their RNA memories. This implies that such organisms should have included in their RNA bases that contain sulphur or halogen representing high bio-impact atomic content (i.e., atoms that relatively easily participate in a large number of organic chemical compounds) of the environment. Considering that most RNAs representing proteins went through a long evolutionary selection process driven partly by simplification – expansion forces, we expect that sulphur and halogen containing bases should be present in the older, more preserved tRNAs representing the amino acids that are added to forming proteins during protein synthesis. Experimental evidence shows that sulphur containing bases occur commonly in tRNAs of thermophilic archaea (e.g., Thermus thermophilus) (52, 53, 54)). These organisms typically live in high sulphur concentration, high temperature, deep marine environments. In accordance with our theory they should have nucleic bases in their tRNA which represent sulphur, which is confirmed by experimental analysis. In the case of Escherichia coli growing in presence of iron, which is associated in natural conditions with the presence of sulphur (55), tRNA molecules include sulphur containing 2-methylthio-N6-(∆2-isopentenyl)-adenosine bases. If iron is bound by iron-binding molecules, inducing low iron content and implying low sulphur content, the same tRNAs will loose the sulphur containing bases, which are replaced by N6(∆2-isopentenyl)-adenosine bases (56). This supports our theory, which implies that in low sulphur environment bacterial tRNA should include less sulphur containing bases than in high sulphur environments. 26 Halogen containing RNA bases are reported in the literature as cancer treatment agents (e.g., 5-fluoro-uracil) (e.g., 57, 58), which prevent normal development of some proteins participating in thymine synthesis and indirectly block the replication of DNA. Others report that halogenated bases are formed in bacteria attacked by immune cells, and these bases contribute to the killing of bacteria (49). According to our hypothesis it should be possible to find such nucleic bases in some primitive life forms living in halogen-rich environment. Considering that the incorporation of halogen containing bases may prevent the formation of thymine we consider it unlikely to find DNA containing micro-organisms having halogen containing bases in their tRNA. At the same time it might be possible to find RNA viruses of halobacteria, which may contain halogenated RNA bases. Alternatively it may be also possible to find primitive unicellular organisms living in halogen-rich environment which replicate without using thymine containing DNA. Such organisms could live only in isolated ecological niches, where competition with DNA containing life forms would not have rendered them to become extinct. C. Replication A fundamental concept of evolution theory is the replication (31), which means the identical or almost identical replication of life forms (e.g., cells, whole multi-cellular organisms). In the context of the RNA world hypothesis the replication happens at the level of RNA and DNA, which are replicated by a complex molecular interaction machinery involving RNA, DNA molecules, proteins and other molecules. High precision replication being required for stable evolution the supposition of such RNA replication is a cornerstone of the RNA world hypothesis. At the same time this is also 27 the weakest point of this theory, as such high precision replication machinery without pre-existing large RNA and DNA molecules regulating the replication process is not known (2). The protein interaction world hypothesis considers that replication of early life happens in terms of replication of peptide/protein interactions, organized in sequences of interdependent interactions. The replication of such interactions requires the presence of the same molecules to reproduce the interaction. The replication of conditional sequences of interactions requires the execution of conditioning interactions that produce the peptides/proteins in the right conformation to perform the conditional interaction. The replication of longer sequences of conditional interactions is limited by the diffusion of required molecules. The diffusion of molecules can be reduced by encapsulating them in vesicles made of lipid membranes. Such vesicles could have formed in prebiotic conditions according to results of experiments trying to replicate prebiotic Earth environment (5). This suggests that the replication required for the protein interaction world hypothesis can be based on plausible processes. The emergence of memories of peptide/protein interactions in form of RNAs and of RNA interactions in form of DNA molecules allows high precision replication of interactions between large proteins resulting from many interactions between other proteins, amino acids and other molecules. The protein interaction world hypothesis provides a well integrated role for RNA and DNA molecules in the process of replication of life, including the replication of these molecules. 28 A key difference between the replication concept of the protein interaction world and RNA world hypotheses is that while the RNA world builds on replication of molecules, the protein world hypothesis builds on the replication of interactions between molecules. In the context of the protein interaction world the replication of molecules is a side effect of replication of interactions between molecules. According to the protein interaction world hypothesis the concept of replication is extended into the concept of replication and expansion. Protein interaction systems replicate and expand by producing protein interactions that follow their conditional rules and in this they way they reproduce and expand themselves. The growth limits of interaction systems lead to the splitting of systems and to system scale replication and expansion. 29 6. CONCLUSIONS The above described protein interaction world hypothesis formulates an alternative to the most commonly accepted RNA world hypothesis about the origins of life. The protein interaction world hypothesis is fundamentally different from the RNA world hypothesis in the sense that while the RNA world hypothesis is build on the concept of replication of RNA and DNA molecules, the protein world hypothesis is built on the assumption of replication and expansion of protein interaction systems perceived as abstract communication systems. The protein interaction world hypothesis is based on experimentally validated, plausible assumptions about the emergence of early peptide/protein interaction systems in the prebiotic Earth environment. The protein interaction world hypothesis provides a systematic integrated view of how life emerged and developed, providing well-defined places for RNA molecules that are seen as memories of protein interactions, and DNA molecules considered as memories of RNA interactions. Predictions based on the protein interaction world hypothesis about the sulphur containing unusual RNA bases fit with experimental findings about these bases providing additional support for this hypothesis. Further predictions about halogen containing bases are not yet validated by experimental evidence, but they point very specifically for directions in which potential experimental evidence might be possible to be found. 30 The perception of living systems as abstract communication systems opens up new avenues for research about the role of various organic molecules, the evolution of their role in the context of living systems, and for the analysis of the boundary between living and non-living systems. We believe that the protein interaction world hypothesis can provide more parsimonious explanations of how living systems work and organize themselves than other hypotheses about the origins of life like the RNA world hypothesis. 31 REFERENCES 1. Ganti, T (1997). Biogenesis itself. Journal of Theoretical Biology, 187:583-593. 2. Joyce, GF (2002). Booting up life. Nature, 420:278-279. 3. Segre D, Lancet, D (2000). Composing life. EMBO Reports, 1:217-222. 4. Woese, CR (1987). Bacterial evolution. Microbiological Reviews, 51:221-271. 5. Miller, SL, Orgel, LE (1974). The Origins of Life on the Earth. Englewood Cliffs, NJ, Prentice Hall. 6. Botta, O, Bada, JL (2002). Extraterrestrial organic compounds in meteorites. Surveys in Geophysics, 23:411-467. 7. Henry, DA (2003). ‘Star dust memories’ – A brief history of the Murchison carbonaceous chondrite. Publications of the Astronomical Society of Australia, 20:viiix. 8. Pizzarello, S (2004). Chemical evolution and meteorites: An update. Origins of Life and Evolution of the Biosphere, 34:25-34. 9. Deamer, DW (1997). The first living systems: a bioenergetic perspective. Microbiology and Molecular Biology Reviews, 61:239-261. 10. Joyce GF (1989). RNA evolution and the origins of life. Nature, 338:217-224. 11. Joyce, GF (2002). The antiquity of RNA-based evolution. Nature, 418:214-221. 12. Unrau, PJ, Bartel DP (1998). RNA-catalysed nucleotide synthesis. Nature, 395:260-263. 13. Zubay, G, Schechter, A (2000). Current status of the RNA – only world. CHEMTRACTS – Biochemistry and Molecular Biology, 13:829-836. 32 14. Cottin, H, Gazeau, MC, Raulin, F (1999). Cometary organic chemistry: a review from observations, numerical and experimental simulations. Planetary and Space Science, 47: 1141-1162. 15. Glavin, DP, Bada, JL (2004). Isolation of purines and pyrimidines from the Murchison meteorite using sublimation. 35th Lunar and Planetary Science Conference, paper 1022. 16. Robertson, MP, Miller, SL (1995). An efficient prebiotic synthesis of cytosine and uracil. Nature, 375:772-774. 17. Zubay, G (1999). Synthesis of the first nucleotides. CHEMTRACTS – Biochemistry and Molecular Biology, 12:432-452. 18. Zubay, G (2000). Biochemical pathways may provide leads to prebiotic pathways. CHEMTRACTS – Biochemistry and Molecular Biology, 13:357-363. 19. Zubay, G, Schechter, A (2001). Prebiotic routes for the synthesis and separation of ribose. CHEMTRACTS – Biochemistry and Molecular Biology, 14:117-124. 20. Dworkin, JP, Lazcano A, Miller, SL (2003). The roads to and from the RNA world. Journal of Theoretical Biology, 222:127-134. 21. Maynard-Smith, J, Szathmary, E (1997). The Major Transitions in Evolution. Oxford, Oxford University Press. 22. Lacey, JC, Cook, GW, Mullins, DW (1999). Concepts related to the origin of coded protein synthesis. CHEMTRACTS – Biochemistry and Molecular Biology, 12:398-418. 23. DeDuve C (1993). RNA without protein or protein without RNA ? In: What is Life ? The Next Fifty Years, Murphy, MP, O’Neill, LAJ (eds.), Cambridge University Press, Cambridge, UK, pp.79-82. 33 24. Charlton, BG, Andras, P (2003). The Modernization Imperative. Exeter, Academic Imprint. 25. Luhmann, N (1996). Social Systems. Stanford University Press. 26. Maturana HR, Varela, FJ (1980). Autopoiesis and Cognition : the realization of the living. Boston, D. Reidel Publishing Company. 27. Fox, SW, Bahn, PR, Dose K, et al. (1994). Experimental retracement of the origins of a protocell – it was also a protoneuron. Journal of Biological Physics, 20:17-36. 28. Leman, L, Orgel, L, Ghadiri, MR (2004). Carbonyl sulphide mediated prebiotic formation of peptides. Science, 306:283-286. 29. Kauffman, SA (1993). ‘What is life ?’: was Schrodinger right ? In: What is Life ? The Next Fifty Years, Murphy, MP, O’Neill, LAJ (eds.), Cambridge University Press, Cambridge, UK, pp.83-114. 30. Doudna JA, Cech, TR (2002). The chemical repertoire of natural ribozymes. Nature, 218: 222-228. 31. Szathmary, E (2003). Why are there four letters in the genetic alphabet ? Nature Reviews Genetics, 4:995-1001. 32. Segre, D, Lancet, D, Kedem, O, Pilpel, Y (1998). Graded autocatalysis replication domain (GARD) : Kinetic analysis of self-replication in mutually catalytic sets. Origins of Life and Evolution of the Biosphere, 28:501-514. 33. Dworkin, JP (1997). Attempted prebiotic synthesis of pseudouridine. Origins of Life and Evolution of the Biosphere, 27:345-355. 34. Ferris, JP (2003). Montmorillonite catalysis of 30-50 MER oligonucleotides: laboratory demonstration of potential steps in the origin of the RNA world. Origins of Life and Evolution of the Biosphere, 32:311-332. 34 35. Huang, W, Ferris, JP (2003). Synthesis of 35-40 mers of RNA oligomers from unblocked monomers. A simple approach to the RNA world. Chemical Communications, 12:1458-1459. 36. Larralde, Robertson, MP, Miller, SL (1995). Rates of decomposition of ribose and other sugars : Implications for chemical evolution. PNAS, 92:8158-8160. 37. Levy, M, Miller, SL (1998). The stability of RNA bases: Implications for the origin of life. PNAS, 95:7933-7938. 38. Szabo P, Scheuring I, Czaran T, Szathmary E (2002). In silico simulations reveal that replicators with limited dispersal evolve towards higher efficiency and fidelity. Nature, 420: 340-343. 39. Nelson, KE, Levy, M, Miller, SL (2000). Peptide nucleic acids rather than RNA may have been the first genetic molecule. PNAS, 97:3868-3871. 40. Wiener, N (1948). Cybernetics or, Control and communication in the animal and the machine. New York, Wiley. 41. von Bertalanffy, L (1973). General System Theory: foundations, development, applications. Harmondsworth, Penguin. 42. Perko, L (1996). Differential Equations and Dynamical Systems. New York, Springer. 43. Orgel, LE (2000). Self-organizing biochemical cycles. PNAS, 97:12503-12507. 44. Imai, E, Honda, H, Hatori, K, Brack, A, Matsuno, K (1999). Elongation of oligopeptides in a simulated submarine hydrothermal system. Science, 283:831-833. 45. Francis, BR (2000). A hypothesis that ribosomal protein synthesis evolved from couple protein and nucleic acid synthesis. CHEMTRACTS – Biochemistry and Molecular Biology, 13: 153-191. 35 46. Deamer, DW, Chakrabarti, A (1999). The first living organisms: In the light or in the dark. CHEMTRACTS – Biochemistry and Molecular Biology, 12: 453-467. 47. Koonin, E,V. (2003), Comparative genomics, minimal gene sets and the last universal common ancestor. Nature Reviews Microbiology, 1:127-136. 48. Mueller, EG, Buck, CJ, Palenchar, CM, Barnhart, LE, Paulson, JL (1998). Identification of a gene involved in the generation of 4-thiouridine in tRNA. Nucleic Acid Research, 26:2606-2610. 49. Henderson, JP, Byun, J, Heinecke, JW (1999). Molecular Chlorine Generated by the Myeloperoxidase-Hydrogen Peroxide-Chloride System of Phagocytes Produces 5Chlorocytosine in Bacterial RNA. Journal of Biological Chemistry, 274: 3344033448. 50. Denli, AM, Tops, BBJ, Plasterk, RHA, Ketting, RF, Hannon, GJ (2004). Processing of primary microRNAs by the Microprocessor complex. Nature, 432: 231234. 51. Gregory, RI, Yan, K-P, Amuthan, G, et al. (2004). The Microprocessor complex mediates the genesis of microRNAs. Nature, 432: 235-241. 52. Naoki, S, Suzuki, T, Tamakoshi, M, Oshima, T, Watanabe, K (2002). Conserved bases in the TΨC loop of tRNA are determinants for thermophile-specific 2thiouridylation at position 54. The Journal of Biological Chemistry, 277:39128-39135. 53. McCloskey, JA, Graham, DE, Zhou, S, et al. (2001). Post-transcriptional modification in archaeal tRNAs: identities and phylogenetic relations of nucleotides from mesophilic and hyperthermophilic Methanococcales. Nucleic Acids Research, 29:4699-4706. 36 54. Watanabe, K, Kuchino, Y, Yamazuki, Z, Kato, M, Oshima, T, Nishimura, S (1979). Nucleotide sequence of formyl-methionine tRNA from an extreme thermophile, Thermus thermophilus. Journal of Biochemistry, 86: 893-905. 55. Sekowska, A, Kung, H-F, Danchin, A (2000). Sulphur metabolism in Escherichia coli and related bacteria: Facts and fiction. Journal of Molecular Microbiology and Biotechnology, 2:145-177. 56. Buck, M, Griffiths, E (1982). Iron mediated methylthiolation of tRNA as a regulator of operon expression in Escherichia coli. Nucleic Acids Research 10: 2609– 2624, 57. Maxwell, PJ, Longley, DB, Latif, T, et al. (2003). Identification of 5-fluorouracilinducible Target Genes Using cDNA Microarray Profiling. Cancer Research, 63: 4602-4606. 58. Samuelsson, T (1991) Interactions of transfer RNA pseudouridine synthases with RNAs substituted with fluorouracil. Nucleic Acids Research, 19: 6139-6144 37