In Silico Nucleic Acid Elongation and Replication using Stochastic Cellular Automata Supplementary material Chrisantha Fernando, Johan Elf , Jon Rowe, Eörs Szathmáry February 2007 1. Introduction 2. The stochastic model of template dynamics in the context of previous work 3. Methods 3. The Overall Algorithm 9. Formal Description of Algorithm 12. The Two Time Scale Method 16. Reaction Rules 19. Justification for Reaction Rules 27. RY Stacking Rules 29. Introducing Stacking bonds (s-bonds) 31. Validation of Melting Temperatures 32. Validation of h-bond Kinetics 33. Validation of p-bond Kinetics 34. Validation Comparing CGCG and GCGC Dynamics 43. Results 43. Different reactions were observed at high and low temperatures 44. Algorithms used to detect replication 50. Elongation is a Side-Reaction of Autocatalytic Template Replication 1 The stochastic model of template dynamics in the context of previous work Previous models of the origin of nucleic acid replication have consisted of ordinary differential equation models that incorporate high-level chemical assumptions (Kanavarioti and Bemasconi, 1990; Kanavarioti, 1994; Fernando and Di Paolo, 2004) such as direct chain growth (Wattis and Coveney, 1999), sub-exponential or parabolic template growth rates in a closed system (Von Kiedrowski, 1993), ribozyme effects, or postulate idealized replication mechanisms (Wills et al, 1998). Macroscopic physical models of template replication, although ingeniously relaxing these assumptions, do not embody the order of magnitude differences in rates between phosphoamidate-bond (pbond) and Watson-Crick base pair (h-bond) events observed in nucleic acids, and have not yet demonstrated long template replication (Breivik, 2001; Griffith et al 2005). In order to integrate these diverse approaches, a low-level stochastic model of the underlying chemical kinetics of nucleotide and polymer dynamics was designed to explore the sufficient functional conditions for non-enzymatic long template replication. This model was applied to replicate some of the findings of Von Kiedrowski’s group using phosphoramidate dimers. 2 Methods The Overall Algorithm Figure 1. A high-level diagram of the overall simulation algorithm is shown. Only a small sample of simulated templates, and only C & G nucleotides, are depicted. 3 An informal description of the algorithm is provided, followed by a formal specification. Figure 1 above shows the overall organisation of the simulation. The simulation used a two-time-scale technique to deal with the order of magnitude difference in characteristic times between (Watson-Crick base pair) h-bond events and (phosphoramidate) p-bond events. An ‘event’ means the formation or breakage of a bond or a bond-type. Constrained by high computational cost, a very small volume1 was modelled as a flow reactor containing initially 500 dimers, giving an effective initial oligomer double strand concentration of 20 mM. Using a variant of the Gillespie algorithm, the hydrogen bond dynamics of the system were simulated until a quasi steady state was reached, e.g. for a period of 0.0001-0.05 seconds2. Figure 3 describes the reaction rules used to calculate propensities. Once a steady state in the fast h-bond dynamics was reached, microstates from this steady-state were sampled at fixed time intervals of 0.05 seconds. Each microstate had an associated propensity of p-bond formation, given by the sum of the number of potential p-bond forming configurations contained in that microstate multiplied by the rate of p-bond formation associated with each p-bond forming configuration. Also, each microstate had an associated p-bond breakage propensity, which was assumed to be simply proportional to the number of p-bonds in that microstate, therefore this value was equal across all microstates. A time was generated at which the next p-bond formation event was expected to occur, by entering the p-bond formation propensity into equation 1 below… Avogadro’s number x Volume is defined as 50000. Dividing the number of molecules by this value gives the concentration in moles/L. 2 The rules are applied to the system using a variant of the SSA algorithm (Elf and Ehrenburg, 2004), based on the next reaction method of the Gillespie algorithm (Gillespie, 1977; Gibson and Bruck, 2000). 1 4 ti ln( rand()) Pi …. Equation 1. Similarly, an expected time was generated for the next p-bond breakage event. The event occurring first was executed. Roulette wheel selection was used to randomly choose a microstate in which to execute the event, with microstate weighting determined by the pbond event propensity in that microstate. Once a microstate had been chosen, roulette wheel selection was applied over all p-bond events of the appropriate type in that microstate. In general, a time to the next event was always obtained by applying equation 1 to the propensity of that event. The event occurring earliest was the one chosen for execution. The particular bond that was chosen to undergo an event of the chosen class was obtained by percolation through a chain of roulette wheel selections, biased by propensities at each level in the hierarchy. 5 Figure 2. Top: The representation of polymers allows a very simple secondary structure that forbids hairpins. Therefore, any mechanism of replication requiring hairpins will not be possible in this model. Top Left: Internal loops are possible. Top right: Breathing ends are possible. Bottom: Grid A shows a complementary double strand of length 5. Grid B shows a complementary double strand of length 10. Grid C shows a complementary double strand where the rightmost h-bond has been broken. Grid D shows a single strand of length 5. Grid E shows one of the possible polymers that may result when D collides with C. Grid F shows a polymer that may result when D collides with another single strand. This will be unstable because the h-bond is between a non-complementary pair of nucleotides. Any polymer 6 configuration is permitted by the model, as long as two steric constraints are obeyed: nucleotides cannot overlap each other, i.e. only a single p-bond and h-bond can occupy the same edge; one nucleotide can only be attached to one other nucleotide by an h-bond. The stability of a newly formed polymer is an emergent property of the strength of the stacking and h-bond forces that result immediately after association. Figure 2 shows the representation of nucleic acids used in the simulation. A polymer is defined as a contiguous molecule joined by h-bonds and p-bonds. Each polymer was represented on a separate unbounded grid. Each grid represented individual A, C, G, and T nucleotides on its vertices, hydrogen (h-bonds) on the vertical edges, and covalent bonds (p-bonds) on the horizontal edges. Thus, a large set of polymer secondary structures could be represented on the grid. Hairpins were excluded because p-bonds were confined to the horizontal edges. Using this representation, firstly, a composite propensity of reaction was calculated for each polymer, IPRP (intra-polymer reaction propensity). The IPRP was defined as the rate of reactions of type i, multiplied by the number of configurations that could undergo reaction type i, within that polymer at time t, (excluding p-bond reactions). Figure 3 shows the reaction rules used to calculate the IPRP. Secondly, a propensity of collision was calculated for all possible reactions between polymers, PPP (polymer-polymer propensity). This was the propensity that the next event would be a collision event rather than an intra-polymer reaction. We assumed a well-stirred reactor, i.e. the probability of collision of all polymer pairs was equal. Thirdly, the propensity that the next event would be a reaction between a monomer and a polymer was determined, so defining the monomer-polymer propensity, MPP. Having determined all composite event propensities, times were generated for each composite event type. The event type that occurred first was chosen for execution. Roulette wheel 7 selection was used to percolate the event type decision to a particular location. If the earliest event was an intra-polymer event then roulette wheel selection was used to choose which particular polymer would undergo the event. Roulette wheel selection was again applied to all possible events within that polymer, and an event was executed at time t+x, where x was the time generated using IPRP as the propensity. If the earliest event was a collision event then the collision algorithm was executed, whereby two polymers where randomly chosen to undergo a collision, at two randomly chosen sites. A collision was deemed legitimate if it did not result in overlapping nucleotides or bonds; and if legitimate, a h-bond was formed between these two sites at time t + x, where x was given by applying PPP to the equation 1, and the two polymers were joined to form one polymer. In later models collisions could take place by s-bonds (stacking bonds). This was necessary to explain the observed ability for dimers to elongate even at 30oC where no template directed ligation is possible. If the collision was not successful, time was updated by x, as above, but no change was made to the polymers. Similarly if the earliest event was a monomer-polymer association, roulette wheel selection was again used to choose a polymer onto which a monomer would attach, and again, roulette wheel selection was used to determine a site on that polymer onto which a monomer would attach. A monomer was attached by an h-bond at time t + x where x was given by applying MPP to equation 1. 8 Formal Description of Algorithm The algorithms described above are defined formally below. Algorithm 1 describes how fast h-bond dynamics were run to a quasi steady state between each p-bond event, and algorithm 2 describes the process shown in figure 1. Initialization involves making N polymers and placing them into the grids. In most experiments, 500 dimers were used as the initial set. Monomers can also be entered at initialization, e.g. 10-20 monomers of type A, and 10 to 20 monomers of type B, and set the simulation time, t = 0. Monomer concentration was kept constant, although in later 9 experiments the system was initialized with a number of dimers that were allowed to deplete. Initialization also involves calculating all the reaction propensities. Line 1: For each polymer, calculate its intra-polymer-reaction-propensity (IPRP). This is the propensity of the event ‘some reaction occurs in polymer x’. where R1…R5 are the propensities of reactions described by rules 1 to 5 in figure 3, that are summed over all bonds in the polymer. Line 4: Calculate the monomer-polymer-propensity (MPP). This is the propensity of the event ‘one monomer attaches to a polymer’ occurs, where Av = Avogadro’s number, Q = Volume (in litres) giving AvQ = 3.1548x104M-1 Concentrations of monomers are obtained by dividing the number of monomers by AvQ. pA = (number of free A nucleotides on the polymer), pB = (number of free B nucleotides on the polymer). These are summed over all the polymers. The rate coefficient of complementary monomer attachment, c = 106 sec-1 and non-complementary monomer attachment, nc = 103 sec-1. Line 5: Calculate the polymer-polymer-propensity (PPP). This is the propensity of the event ‘a polymer collides with another polymer’. This is the sum over the collision propensity of each possible polymer pair. The association propensity, c = 50000x106 [Litre M-1sec-1] is used as the association rate coefficient. Note that we calculate the propensity of a collision event, not the propensity of successful association. If a collision event occurs, then two polymers are chosen for collision by roulette wheel selection on the polymer pair matrix. They are then associated at a randomly chosen pair of free hbond sites, one on each polymer. If the association is legitimate then the polymers are joined. If the association is illegitimate, i.e. if there is overlap of nucleotides on the 2D grid, then the polymers are returned unchanged, and the collision is deemed unsuccessful. 10 Lines 6,7,8: For each polymer (IPRP event), and for the MPP and PPP events, sample a putative time, ti , at which that event would occur, from an exponential distribution ln(rand())/pi where parameter pi = IPRP(x), MPP and PPP respectively. Line 9: Store the ti values so obtained, in an indexed priority queue. Line 10: Until termination, repeat the following loop. Line 11: Let u be the reaction whose putative time ti stored in the priority queue, is least. Line 12: If the reaction was an IPRP event, then by using roulette wheel selection, decide which particular event in that polymer x, should occur. Roulette wheel selection allows the events to be sampled in proportion to their individual rates. If the reaction was a MPP event, again use roulette wheel selection to decide onto which polymer a monomer should attach. If the event was a PPP event, then again use roulette wheel selection to decide which two polymers have collided. Line 13: Change the state of the system to reflect the execution of reaction event U. This is done as described in the reaction rules, for example, if a collision event occurs, then run the collision algorithm to see if a legitimate collision has occurred. Line 14: Update only the propensities that would have changed due to reaction U, by generating a random number according to an exponential distribution using the new propensity pi, and set tnew = -ln(rand())/pi + t, in the priority queue for that updated event. Line 15: Reorder the priority queue so that the earliest event is at the top. Line 16: Repeat from Line 10. The advantage of the algorithm is that if the effect of a reaction on the other propensities can be limited, then it is not necessary to recalculate all the propensities every time an 11 event occurs. The algorithm is rather different from the next reaction method, for instance we do not use an explicit dependency graph. However, even with such a method, due to the many rapid hydrogen bond formation and breakage events, particularly those caused by polymer association and dissociation (which occur very frequently), it is not possible to simulate more than a few seconds of real time in a typical overnight run. One way around this is suggested by the fact that hydrogen bond dynamics reach a pseudo-equilibrium conditioned on the current p-bond state. Because this is so, it is only necessary to simulate the hydrogen bond dynamics until this equilibrium has been reached. At that point we can sample the next p-bond event. The Two Time Scale Method There are two very different timescales in template replication dynamics. Base pairs (hydrogen bonds) form and break, with rates that are many orders of magnitude faster than phosphoramidate bond formation and breakage. A technique was developed to solve the problem efficiently. This method has general applicability in systems with clearly distinguishable time-scales. It can be used only where it is possible to simulate the fast dynamics in isolation from the slow dynamics, until the system reaches equilibrium. To model the fast dynamics, p-bond formation and breakage events are ‘switched off’, i.e. these events are removed from the priority queue. The h-bond dynamics are then run to equilibrium conditioned on the current p-bond state. Equilibrium of fast dynamics is said to occur when several measures of the configurations of all polymers in the 12 population, have stabilized, i.e. when the proportion of single to double strands no longer changes, when the relative frequencies of reaction types no longer changes, and when the average polymer mass no longer changes. If the measures that were monitored were the average p-bond formation rates at each position, then the algorithm for separation of time-scales would be exact. This measure is however too expensive to compute, which is why simpler and approximate measures are used instead. Typically equilibrium times of 0.005 seconds are required, although occasional trials have been conducted with longer equilibrium times, to check for deviations. For very long strands, equilibrium may take too long to reach and so inaccuracies are introduced in this way. When equilibrium is reached, N random samples of microstates (a microstate being the instantaneous state of all polymers) are taken. Between each of these samples, the fast dynamics are run again, for a fixed length of time, long enough to ensure that the samples are independent (e.g. 0.001 - 0.005 seconds). For each microstate that is sampled, the total p-bond formation propensity and total p-bond breakage propensity is calculated. The mean total p-bond formation and breakage propensities over all sampled microstates are calculated. An event time is generated using, ti = -ln(rand())/pi, on the basis of these mean total propensities for a p-bond breakage event and for a p-bond formation event. The event with the earliest time is carried out. Serial roulette wheel selection is used to percolate this event to the correct microstate, without having to store all the microstates in memory. Typically, p-bond events occur with a characteristic time of minutes or hours, whereas h-bond events occur with a characteristic time of 10-6 13 seconds. The algorithm for the two time scale method is shown below, and replaces the contents of the main while loop of algorithm 1 above. As N tends to infinity, the calculation of p-bond formation propensity becomes asymptotically more accurate. If the dynamics are such that very rare microstates contribute significantly to the p-bond formation propensity then it is clear that larger sample sizes will be required. Different values of N and X were used to test the minimum sample size that could be safely used. A sample size of 10 was typical with a time between samples of 0.005 seconds. A problem with this algorithm is that if the total p14 bond formation rates are very different for different microstates, for example, if there is a small class of microstates, e.g. 0.1 percent of all microstates that have a very high p-bond formation rate, these can easily be missed in the current sampling scheme, although they in reality would contribute a lot to the formed p-bonds. Another problem is that it is difficult to ensure that N samples are really independent. For example, certain complex polymer configurations may change much more slowly than the times that can reasonably be simulated with h-bond dynamics running. An example is with long duplexes, which will dissociate at low rates, but will not have sufficient time to do so given the short times for which the rapid h-bond dynamics are run. Occasional runs have been conducted with longer equilibrium times and similar results are observed. 15 Reaction Rules Figure 3. The above rules define all possible reactions within polymers (rules 1 to 5) between polymers (rule 7) and between polymers and monomers (rule 6). Each reaction is associated with a propensity, and this is used to calculate the IPRP, PPP and MPP composite propensities used in algorithm 1. Figure 3 above describes how the IRPR, PPP and MPP values were calculated using realistic reaction kinetics and free energies obtained from empirical studies (Cantor and Schimmel, 1980; Turner, 2000; Reynaldo et al, 2000; Rohatgi et al, 1996; Schoneborn et al, 2001; SantaLucia, 1998; Xia et al, 1998). P-bond formation and breakage rates were modelled on phosphoramidate bonds, although due to the time-scale separation between h-bond and p-bond dynamics, the same overall behaviour is expected irrespective of the absolute values of p-bond formation and breakage rates. The probability of bond breakage and formation are functions of the local neighbourhood configuration of bonds, 16 monomer concentration, and temperature. The rules shown in figure 4 were applied to each bond or potential bond site on each polymer to calculate the rate of that bond breaking or forming and then used to calculate the composite propensities, IPRP, PPP and MPP. Rule 1: h-bond breakage rate, dm Am e(E a Kn )/ RT . GC single h-bond breakage activation energy Ea = 31.0kJ. AT single h-bond breakage activation energy Ea = 29.5kJ. h-bonds = 18.4 kJ. This resulted in approx. 10% Mean Ea for non-complementary mispaired nucleotides. Modifications applied due to stacking effects are described in figure 4. T is the temperature in Kelvin (either 275 or 300K in these experiments). R = 8.314x10-3. kJ K-1 mol-1. The central h-bond in configuration B1 contains no h-bonds adjacent to it, and so it has the highest probability of breakage. The central h-bond in the BN configuration has 4 adjacent h-bonds, and so has the lowest probability of breakage. The ordering of stabilities of the central h-bond from least-to-most stable is B1 > B12 > B2 > B23 > BN1 > BN3. The Arrhenius constants for each neighbourhood configuration are approximated by AB1 = 3.6x1013, AB12 = 1.8x1013, AB2 = 7.3x1012, AB23 = 1.8x1012, AB3 = 3.6x1011, ABN1 = 3.6x1010, ABN = 1.8x1010. The entropy term Kn was required to obtain the proper melting temperature curve for long strands. K was empirically set to 0.01, n is the number of nucleotides not attached by h-bonds to another nucleotide, e.g. those at the dangling ends. Rule 2: Catalysed h-bond formation rate = 106 sec-1 adjacent to other h-bonds. Rule 3: Zipper h-bond formation rate = 106 sec-1 irrespective of the distance from the potential h-bond to the closest nucleation site (a nucleation site is a pattern of 3 adjacent h-bonds). Note that the simultaneous application of rule 2 and rule 3 result in h-bonds being twice as likely to form next to another h-bond than anywhere else 3 These are codes for the classes of equivalent h-bond neighbourhood states that contribute the same stacking stability to the central h-bond. They label the 16 configurations shown in figure 3 rule 1. 17 along a duplex. Rule p _ form _ rate 0.6e25 RT 4: Template . Rule directed 5: p-bond p-bond formation breakage rate rate = = rate _ p _ break 1.32 1012 e110 RT , n.b p-bond breakage rate exceeds p-bond formation rate, above 350K. Note rules 4 and 5 are only applied in the outer loop of algorithm 2, not in algorithm 1. Rule 6: Monomer attachment. Complementary rate = 106 sec-1. Noncomplementary rate = 103 sec-1. Stacked monomer attachment (i.e. attachment of a monomer next to an h-bond) is 100 times these values. Rule 7: Collision rate between any two Polymers = 5x109 sec-1 M-1. Note this is a bimolecular rate constant because it is multiplied by two concentrations. The total collision propensity is given by is 5x10 9 x (No. of Polymers/NAV) where NA = Avogadros number, and V is the volume. We set NAV = 104 when the tide was out, and NAV = 105 when the tide was in. When a collision occurred, the two random nucleotides between which a new h-bond (or transient “stacking” s-bond, see figure 5) would form were chosen, and the polymers were combined at that site. If the combination resulted in no overlap of nucleotides on the 2D grid, then the collision was successful and polymers were combined, else the polymers remained separate. ‘Rule’ X: Spontaneous p-bond formation by association of monomers was forbidden because the rate was too low to be significant at the time-scales considered. However, spontaneous p-bond formation between oligomers by single stranded and double stranded (non-templated) end ligation was allowed, see figure 5. 18 Justification for Reaction Rules Figure 3 above shows all the reaction rules that have been implemented in the model. The reaction rules determine the propensity of reactions. The propensity of a reaction is defined as the rate of that reaction times the number of reactants that can undergo that reaction. From the propensity of a reaction, the overall probability of that reaction occurring in the reactor can be calculated. The propensity of bond breakage and formation is a function of the local neighbourhood configuration of bonds, the monomer concentration, and temperature. The reaction rules are applied to each bond, and to each potential-bond location, to calculate the propensity of a bond breaking or forming respectively at that site. The relevant propensities are updated, whenever a reaction makes or breaks a bond. Rule 1: h-bond breakage: Watson-Crick base pairs break and form extremely rapidly, especially at the breathing ends of a strand. This is because, due to double strand stacking effects, it is easier to break h-bonds when they are not adjacent to other h-bonds. Therefore, we have assumed that the probability of breaking an h-bond depends on the neighbourhood configuration of its 4 adjacent h-bonds. Figure 1 rule 1, shows the 16 possible h-bond configurations around a central h-bond, labelled from B1 (no adjacent hbonds) to BN (4 adjacent h-bonds). The h-bond breakage rate for one h-bond is given in units sec-1 per bond) by the Arrhenius equation. dm Am e(E a K n / RT ) . dm is the rate of hydrogen bond breakage for a hydrogen bond with configuration m, Am is the Arrhenius 19 constant for the configuration m. Ea is the activation energy for h-bond breakage. In early experiments, Ea = -34.4kJ for complementary h-bonds and Ea = -29.4kJ for noncomplementary h-bonds. This means that non-complementary h-bonds have a greater probability of breaking at any given temperature. We demonstrated experimentally that this difference in activation energy between complementary and non-complementary base pairs resulted in approximately 10 percent mispaired base pairs. This is the right ballpark of mispairing frequency expected without enzymes. T is the temperature in Kelvin, and R = 8.314 x10-3 kJ M-1 sec-1. The ordering of stabilities from least-to-most stable is B1 > B12 > B2 > B23 > BN1 > BN. The Arrhenius constant, Am is specific for each neighbourhood configuration, and reflects this increasing stability. A further term is required to express the effect that the size of dangling ends has on dm. As the size of dangling ends increases, there are more possible open configurations than closed (reannealed) configurations for the dangling end, and so re-annealing is entropically disfavoured. The dangling-end-length dependent entropy term Kn was required to obtain a linear melting temperature curve for long strands, see figure 4. 20 Figure 4. The effect of dangling end entropy on melting temperature. X-axis shows 1/Length, Y-axis shows 1/Tm. Circles show lengths at which Tm was measured. The Entropy term, Kn, is introduced to remedy a deviation from the expected melting temperature relation as a function of strand length. After the correction is introduced, strands with bigger dangling ends tend to melt more easily. K is empirically set to 0.01, n is the number of nucleotides not attached by h-bonds to another nucleotide, therefore it approximates the number of nucleotides at the dangling ends. Note that the h-bond breakage rates do not depend on the state of p-bonds, i.e. a hbond surrounded by 4 h-bonds formed due to the presence of 4 unlinked monomers, has the same probability of breaking as a h-bond surrounded by 4 h-bonds formed between 21 nucleotides linked by p-bonds. This assumes that stacking interactions are the same, irrespective of whether nucleotides are joined by p-bonds or not. The full model refines these assumptions as described in the paper. If an h-bond break results in breakage of a polymer into two parts, then action is taken to update the box representations. If the polymer breaks into two polymers, then one of the polymers is moved to a new box. If the polymer breaks into one polymer and a monomer, then the monomer is removed, and the number-of-monomers variable is incremented by one. Rule 2: Catalysed h-bond formation. This rule implements the effect of double-strand stacking, whereby h-bonds have a greater probability of forming next to adjacent h-bonds than at random positions in a partially annealed duplex. In the simple model one can assume that the influence of stacking is not a function of the identity (C or G) of the adjacent nucleotide, i.e. all dinucleotide cores are assumed to have equal stacking energies. The NN model shows that this is not in fact the case and so in the model described in the paper, NN and RY effects are considered. The rate of catalysed h-bond formation = 106 sec-1 per possible h-bond. The total propensity of a catalysed h-bond formation is this rate times the number of possible h-bonds that can form in the system. We assume this is a temperature independent process. The shifting of hydrogen bond formation to breakage equilibrium as a function of temperature is due to changes in the rate of h-bond breakage. 22 Rule 3: Zipper h-bond formation. This rule implements the fact that h-bonds have a certain chance of forming anywhere along a partially annealed duplex with rate = 106 sec1 per possible h-bond, irrespective of the distance from the potential h-bond to the closest nucleation site (a nucleation site is a set of 3 adjacent h-bonds). The simultaneous application of rule 2 and rule 3 results in a h-bond being twice as likely to form next to another h-bond than anywhere else along a duplex. In reality, the probability of zipper hbond formation drops off as a function of distance from the nucleation site; however, this happens sufficiently slowly, that for polymers of up to 100 nucleotides in length it can safely be ignored. Again we assume this is a temperature independent process, and allow the h-bond breakage rule to model the effect of temperature on the h-bond equilibrium. Rule 4: Template Directed p-bond formation. p-bonds form when two nucleotides are brought into close contact on a template strand. Such a configuration is shown in rule 4. The rate of p-bond formation, assuming a legitimate configuration, is given by the Arrhenius equation, Pf 0.6e(E a / RT ) , where Ea = 25kJ is the activation energy of p-bond formation and 0.6 is the Arrhenius constant. The units of Pf are sec-1 per potential p-bond forming configuration. Note that the rate for p-bond formation appears high, but that in practice, because collisions between strands often do not result in a legitimate configuration for p-bond formation, the overall rate is much less. Rule 5: p-bond breakage. p-bonds break as described by the Arrhenius equation, Pb 1.32 1012 e(E a / RT ), where Ea = 110kJ is the activation energy of p-bond breakage 23 and 1.32x1012 is the Arrhenius constant. The units of Pb are sec-1 per p-bond. Figure 5 shows the rates of p-bond formation and breakage as a function of temperature. Figure 5. p-bond breakage rate exceeds p-bond formation rate, at 350K. However to obtain the reaction propensity for p-bond formation, the p-bond formation rate is multiplied by the number of appropriate configurations in which p-bonds can be formed, whereas the p-bond breakage propensity is independent of configuration. In practice, p-bond breakage propensity will exceed p-bond formation propensity at a much lower temperature than 350K because the tri-molecular complex required for p-bond formation becomes very unlikely at higher temperatures. 24 Rule 6: Monomer attachment. The rate of h-bond formation by monomer attachment depends on whether the monomer is attaching to form a complementary or a noncomplementary h-bond. The complementary rate = 106 in units of sec-1 per mole of monomer, per potential complementary attachment site. The propensity of monomer attachment for a given potential complementary attachment site is therefore obtained by multiplying this number by the complementary monomer concentration. The noncomplementary rate = 103, in the same units, except, the non-complementary monomers and sites must be considered. Again, stacking interactions are simplified. A nucleotide will attach next to another nucleotide with much greater rate than elsewhere on a template. Stacked monomer attachment is assumed to occur at a 100x greater rate than the values above, for both complementary and non-complementary base pairs, irrespective of the identity of the dinucleotide core. Rule 7: Collision and association between polymers. This rule implements a wellmixed reactor. The concentration dependent collusion rate between two polymers A and B, is assumed to be 106 [Liter M-1sec-1] x (free binding sites on Polymer A)/(AV) x (free binding sites on Polymer B)/(AV), where A = Avogadro’s constant, and V = volume. 106 is the unimolecular rate of formation of a h-bond between a monomer and per potential site on a polymer, in units sec-1. The total propensity of collision between any two polymers is calculated by summing this value over all possible collision pairs. After collision, two potential h-bond sites are chosen at random, and the polymers are joined at this site. However, if there is any overlap between monomers then the collision is considered a failure, and the polymers are returned to the population unchanged. If there 25 is no overlap between monomers, then the h-bond is formed and the polymers are joined, i.e. one polymer is moved to the grid of the other polymer and attached at the correct place. This is a simplification of the collision process, but results in the correct order of magnitude of association rate. It was necessary to introduce this approximation whereby the polymer-polymer event propensity, PPP, represents collision probability and not association probability because otherwise calculation of PPP would be too slow because one would have to calculate all possible collision configurations for all possible polymer-polymer pairs. Instead we calculated the collision probability as a function of simply the number of free binding sites, and then calculate whether an association takes place by randomly defining an attachment site and checking whether it is legitimate. This results in a lower overall association rate than collision rate. The collision rate is near the diffusion limited maximum. In the model presented in the paper, collision frequency is independent of polymer length. This makes a significant increase to the speed of simulation, allows the correct melting temperature curves. ‘Rule’ X: Spontaneous p-bond formation. In the simple simulation, this reaction was forbidden because it is assumed to occur at such a low rate that it can be ignored. This is unfortunately not the case in real chemical systems, for example, where spontaneous formation of phosphoramidate bonds results in non-templated elongation. The model now incorporates spontaneous p-bond formation. 26 RY Stacking Rules Figure 6 below describes how some more elaborate calculations of inter-strand and intrastrand stacking forces were implemented (SantaLucia, 1998; Cruz et al, 1982) because this was known to influence the tendency for elongation, and contributes to ‘phenotypic’ diversity (Sinclair et al, 1984; Zielinski and Orgel, 1987a, 1987b, 1989), for example allowing GCGC to replicate but tending to cause CGCG to elongate. Figure 6. According to the nearest-neighbour (NN) and Pyrimidine-Purine (RY) stacking models, the local configuration of intra- and inter-strand nucleotides influences the stability of h-bonds. To calculate the propensity of breakage of an h-bond, one has to check which instances of the above template configurations 27 apply. The figure shows all the possible stacking rules that may be applied to the central h-bond between a 5’ G (blue square) and a 3’ C (red circle) nucleotides, labelled with a green arrow, given the appropriate nucleotide configuration. The 5’-3’ strand is shown at the top of each duplex. To see how to apply the rules to an h-bond between a 3’ C and a 5’G simply rotate the picture by 180 o about the plane of the page, i.e. look at it from the back. From top left to bottom right: the GC dinucleotide core is assumed to result from two intra-strand stacks having total energy 2 x iCG. The GG and CC cores result from two inter-strand stacks between C and G (with total energy 2 x eCG). The CC core results from one inter-strand stack between G & G (with energy eGG). The additional effects of stacking at longer distances are modelled by the simultaneous application of four further rules, contributing stacking stabilization of the central h-bond at strengths, +CGC, +CGG and +CGGC, applied according to the satisfaction of the templates above, i.e. if the h-bond falls within the red highlighted region then that h-bond receives the stacking stabilization associated with that region. With the inclusion of A and T nucleotides, additional configurations must be considered. The stabilities have numerical values as follows (in kJ): iGC = 1.12, eCG = 0.92, eGG = 2.17, iAT = 0.44, eTA = 0.5, eAA = 0.58, iGT = 0.72, iAC = 0.72, eTG = 1.28, eCA = 1.30, eAG = 1.45, CGC = 1/5, CGG = -4/5, CGGC = 2/5. A scaling factor of +1.8 was typically used to multiply all these stacking effects, and was adjusted to allow approximation of experimentally observed T m values. The effectiveness of these adjustments can be seen by looking at the observed vs modelled T m’s for various oligonucleotide sequences in figure 8. 28 Introducing Stacking bonds (s-bonds) Finally, transient ‘stacking’ bonds were introduced, capable of mediating blunt end ligation of templates, see figure 7. This was necessary to match recent results observed experimentally in real chemical systems of pure dimers capable of elongation despite the fact that templated p-bond formation could not have occurred at 30oC due to the extremely low melting temperatures of dimers. Figure 7. Stacking s-bond formation (shown as a grey bond) occurs between blunt ends, with a certain probability whenever two polymers collide. An s-bond mediated collision occurs if upon collision a random number between 0 and 1 is greater than 0.3 + 0.25 x Log(Number free H-bond sites on Polymer A x Number free H-bond sites on Polymer B) and both strands have less than 8 p-bonds each. The mean rate of s-bond breakage has the same profile as the rate of h-bond breakage in the B3 configuration, and is adjusted by stacking effects as follows. S-break-rate = AB3e-(Es + stack). AB3 = 3.6x*1011 is just the Arrhenius constant 29 for the intermediate h-bond breakage configuration, Es = 15.5kJ (an overestimate necessary to observe sbonds in the small number of templates simulated) and stack = 3.0 x the NN stacking energy contribution due to application of the templates in figure 6. There is a certain rate of p-bond formation on s-bonds, which is assumed to be the same as the rate of templated p-bond formation, i.e. p _ form _ rate 0.6e25 RT . In practice the net ratio of spontaneous to template p-bond formation will depend on the specific chemical system. 30 Validation of Melting Temperatures To check that the above parameters produced the correct behaviour, control experiments were conducted to calculate the melting temperature curve for AnTn and GnCn oligomers, see figure 8. Figure 8. Melting temperatures (Tm) for AnTn and GnCn oligomers at concentration 20mM. N = Length. The x-axis shows 1/N and the y-axis shows 1/Tm. Circles show the points at which Tm was measured. The outlier is a 6-mer of sequence AAGCTT, which as expected shows an intermediate T m between A6T6 and G6C6. 31 Validation of h-bond Kinetics To check h-bond kinetics, dissociation behaviour from double stranded to single stranded states was measured as a function of strand length and temperature, figure 9. Figure 9. The rate of dissociation of ds-GCGC 4-mers at different temperatures is as expected, denaturation being more rapid at higher temperatures. Initial concentration = 10mM double-strands. Note that GCGCG takes longer to dissociate than GCGC at the same temperature, and does not dissociate to as great an extent. 32 Validation of p-bond Kinetics To check p-bond kinetics, an experiment was conducted in a reactor closed to mass and initialized with only GC dimers, at 275K, figure 10. Figure 10. A batch reactor was initialized with 20mM GC dimers at 275K. The continuous lines show expected rates in a simple minimal replicator model for GCGC formation (increasing curve) from GC (decreasing curve) with the same p-bond formation kinetics (Von Kiedrowski, 1993) as in the stochastic model. The ODE model assumes no elongation reactions beyond GCGC, whereas the stochastic model shows such reactions are inevitable unless special blocking of ends is undertaken. Note that Zielinki and Orgel had to block template ends in their non-enzymatic replication experiments to prevent elongation, see supplementary material. 33 Validation Comparing CGCG and GCGC Dynamics In the original set of experiments conducted to produce the algorithm, we had initially used a different method of calculating polymer collision propensity. Originally we assumed that the propensity was an increasing function of the size of polymers. However the following erroneous behaviour was observed. GCGC templates and GC dimers were placed into a reactor, and the system was allowed to reach equilibrium. [GC] = 0.001M and [GCGC] was varied from 0mM to 1mM. Experiments were carried out at low temperatures, e.g. 280 - 300K. However, with all the above parameter settings, and even after increasing iGC to 2.8kJ, GCGC tended to be lost to elongation side reactions resulting in long strands, see figures below. Figure 11. Length distribution of oligomers produced in an experiment at 300K, Initial [GC] = 1mM, Initial [GCGC] = 0.1mM. Length of polmers is shown from left to right. 34 Figure 12. Top: Representation of dimers in computer graphical form. Bottom: Initially p-bonds form in the perfect configuration. Above is shown a 4-mer joined to 2 dimers from a simulation screenshot. Figure 13. But eventually a staged 4-mer duplex forms, which can attach two dimers. It is for this reason that Zielinski and Orgel had to block the tetramer ends, so that such configurations would not lead to elongation. Once a single GCGCGC has been produced, then due to the large number of free binding sites there is a runaway condensation effect with larger polymers acting as seeds. This was unexpected because Zielinski and Orgel had demonstrated the replication of GCGC tetramers, albeit with blocked ends. We modified the simulation to calculate the probability of polymer collision independent of polymer length. This resulted in the following correct behaviour. 35 Figure 14. GCGC length is shown increasing from left to right. However, there is an early period of growth of GCGC, as can be more clearly seen from the screen snapshots shown in figure 5,6, and 7. 36 Figure 15. Shortly into an experiment with a reactor consisting initially of [GCGC] = 0.16 mM = 5 polymers, and [GC] = 1mM = 30 polymers, 2 complexes awaiting p-bond formation can be seen. Already, 2 new GCGC tetramers must have been formed, because we started with 5 GCGC double strands. Temperature = 300K. The picture shows the polymers in the system. Circled in blue are the complexes. No staggered complexes of GCGC are observed at this stage. 37 Figure 16. Later in the experiment, after 5 new GCGC tetramers have been formed, a staggered complex is observed, marked in red with a cross. 38 Figure 17. Eventually elongation reactions predominate, forming longer and longer complexes. At this stage a total of 22 ssGCGC tetramers exist compared to an initial value of 10 ssGCGCs. This is a doubling of GCGC numbers. 39 Therefore autocatalytic growth of GCGC does not continue because eventually GC duplexes are consumed. At this stage, there is a higher concentration of GCGC tetramer and so there is a greater probability of forming staggered GCGC duplexes. Once the GC dimers are consumed, only further elongation reactions are possible. This elongation was avoided in Zielinski and Orgel’s experiments because they blocked the ends of the 4mers. Zielinski and Orgel obtained negative results for the autocatalysis of CGCG tetramers with CG dimers, using an identical protocol to the above experiment, ( Zielinski & Orgel, 1989). Our stochastic model confirms that this is also the case with CGCG with unprotected ends, and explains why. An experiment identical to that shown in the above figures was conducted but with CGCG tetramers at 290K. Staggered complexes were formed right from the beginning of the trial, and all p-bond formation occurred within the elongating complexes. 40 Figure 18. [CGCG] = 0. mM = 5 polymers, and [CG] = 1mM = 30 polymers. Temperature = 290K. There was no replication of CGCG, rather elongation occurred. The experiment lasted 7.0 x 10 7 seconds. There was a rapid rate of utilization of CG. No transient increase in CGCG 4-mers was observed, because all tetramers were ligated to form longer strands straight away, before any tetramer had the opportunity to ligate two dimers into a copy of itself. This is quite different from the behaviour of GCGC tetramers, which although undergoing elongation eventually as dimers are exhausted, initially showed autocatalytic growth. Comparison of GCGC with CGCG behaviour can be made by comparing the screenshots above with those below. Figure 19. [CGCG] = 0. mM = 5 polymers, and [CG] = 1mM = 30 polymers. Temperature = 290K. No replication of CGCG tetramers is observed. This is opposite to the GCGC case. 41 Figure 20. After a further time period, more elongation has occurred, with no replication of CGCG. In conclusion, the dynamics of replication in GCGC vs. elongation in CGCG is confirmed. Because we had no blocked ends, even GCGC showed elongation after an initial period of replication. 42 Results Different reactions were observed at high and low temperatures At low temperatures (2oC) elongation reactions were predominant, whereas at higher temperatures although elongation eventually also took place, short strand replication, and incomplete template replication was also observed, along with a higher incidence of strand breakage, see figure 21 below. Figure 21. A representation of reaction mechanisms directly observed at high and low temperatures. Left: Staggered association (A) and ligation at splint junctions (B) is the mechanism observed to produce elongation at low temperature. Right: At high temperatures, elongation does not occur due to premature strand separation (C), short strands out-competing long strand replication (D), and p-bond breakage (E). 43 Algorithms used to detect replication 44 Figure 22. A ‘strong’ (Top) and a ‘weak’ (Bottom) algorithm for detecting replication. The strong definition of a replicated sequence requires that nucleotides on the ‘grandparent’ strand be contiguous and identically ordered compared to nucleotides on the ‘child’ strand. The historical path shown by the yellow arrows can be traversed in order to determine the length of `strongly replicated’ sections. The weak definition of a replicated sequence allows multiple ‘grand-parent’ strands to contribute matter to the ‘child’ strand, so that identical motifs present in high copy number can contribute oligomers to form the ‘child’ strand. Whenever a p-bond forms, a new `motif’ data structure is created (a motif is defined as the overlapping sequence of nucleotides between copy and template ‘responsible’ for the p-bond formation event). Also, at each p-bond formation event, any motifs (or motif fragments) present in the template are passed to the copy. After an experiment we obtain a store of how many motifs and motif fragments were formed and copied. If many copies of a motif of length N exist, then for large N, this is good evidence that N is being replicated. Two algorithms were used to detect replicated sequences. These are described in figure 22. Results from the first algorithm are shown below. Since the phylogenetic histories of all strands are known, it is possible to trace the ancestry of each p-bond, i.e. to know the identity of the ‘parent’ p-bond and the ‘grandparent’ p-bond on which any p-bond was template replicated (see figure 22a). An algorithm compares the contiguity and order of p-bonds on the ‘child’ strand with that of the ‘parent’ and ‘grandparent’ strands. If these are preserved between adjacent p-bonds on all three polymers, then sequence replication is defined as having occurred in that section. This algorithm is oblivious to mis-pairings. The algorithm is described below. 45 Algorithm to Check for Replication. For each Polymer For the p-bonds on the polymer from left to right If the p-bond on the right of this p-bonds grandparent is the same as the grandparent of the p-bond on the right of this p-bond, then this is a replicated sequence of 3 nucleotides. Store the identity of the child and grandparent nucleotides. Else this is not a replicated sequence. End For End For. In a typical run initialized with random G and C dimers only the following sequences were replicated, 101, 000, 011 , 011 111, 0100 i.e. there existed examples only of 3 and 4 contiguously replicated nucleotides, and one case of 2 such sequences on one polymer. Direct observation shows that it is common that strands consist of small sections of sequences copied from many parents, and even more grandparents. Two examples are given. Figure 23 shows the phylogeny of a short strand with sequence 0110011100. Many different templates are used to make the strand. The strand grows by addition of monomers, not through ligation of oligomers (except for the ligation 00 + 01100111). Figure 24 shows the immediate phylogenetic tree of a slightly longer strand that has been constructed by ligation of two oligomers. 6 p-bonds in the final strand were templated on the strand 000110100101 shown in bold. The complementary sequence of this strand is present in the copy (with one error). Again, this does not demonstrate a full replication cycle since a further round of replication is required to construct the original sequence. The absence of replication of sequences longer than 3 suggests that the intermediate 46 strand (parent) is not able to undergo this second phase of replication. This is because it becomes incorporated into an elongating polymer, unable to itself undergo replication apart from at the breathing ends. Elongation continually displaces the sequence from the breathing end, preventing its continued replication. 47 Figure 23 The strand is formed by the ligation reactions shown. Starting at the bottom and moving up, each reaction is labelled with the sequence of the template used to catalyse the template directed p-bond. A star (*) on this labelling template indicates the particular p-bond on which the new p-bond was formed. The sequence 0110011100 is produced not by ligation of oligomers but by the template-directed incorporation of monomers onto the ends of a growing oligomer. The bold labelled template represents the same template molecule, whereas the first 3 templates and the last template are different molecules. This suggests that 1100, 11001, 011001, 0110011 and 01100111 grew on 11111011000 without being detached. n.b. this phylogentic tree is not sufficient to demonstrate replication since we do not know whether 11111011000 was itself a faithful copy of a grandparent. To do this, we would have to observe the history of each labelled template also. 48 Figure 24. The copied strand is complementary to the parent, except for one mis-pair, and an additional sequence to the left. Such additional sequences shield the central sequence from replication, removing it from the breathing ends of the strand. 49 Elongation is a Side-Reaction of Autocatalytic Template Replication Figure 25. Three OR-coupled (Gánti, 2003) autocatalytic cycles potentially capable of mediating template replication are shown in solid arrows. Side-reactions are partially due to p-bond breakage (thin dashed arrows), but are primarily due to elongation. Only a sample of the possible elongation reactions is shown (thick dashed arrows). The leftmost cycle must turn 4 times to supply the matter for one turn of the rightmost cycle. 50 Figure 25 shows a hypothesised mechanism for template replication in which sequential ligation of oligomers results in replication of long strands. The shortest oligomer, a, is assumed to be produced at a constant rate by monomers. Oligomer aa dissociates and two copies of a re-anneal upon b, producing aab which is ligated to produce bb. bb occasionally denatures to b, and two copies of b re-nature upon c forming bbc, which is ligated to cc, and so on. The following differential equations describe the system. d[a] s [a][b] a [ab] [a][ab] a [aab] d[b]/2 dt …1 …2 d[ab] [a][b] a [ab] [a][ab] a [aab] d[ab] dt …3 d[bb] kb [aab] [b]2 b [bb] d[bb]/10 dd[cc]/20 dt …4 a[aab] [a][ab] kb [aab] a [aab] d[aab]/10 dt …5 d[b] [b][a] a [ab] [b][c] b [bc] [b][bc] b [bbc] 2[b]2 2b [bb] d[b] dd[c]/2 dt …6 d[bc] [b][c] b [bc] [b][bc] b [bbc] dd[bc] dt [bbc …7 d[bbc] [b][bc] kc [bbc] b [bbc] dd[bbc]/10 dt …8 d[cc] kc [bbc] [c]2 c [cc] dd[cc]/10 dg[gg]/20 dt …9 d[c] [b][c] b [bc] 2[c]2 2c [cc] dd[c] [c][g] c [cg] [c][cg] c [ccg] dg[g]/2 dt …10 d[cg] [c][g] c [cg] [c][cg] c [ccg] dg[cg] dt …11 d[ccg] [c][cg] kg [ccg] c [ccg] dg[ccg]/10 dt 51 d[gg] kg [ccg] [g]2 g [gg] dg[gg]/10 dt …12 d[g] [c][g] c [cg] 2[g]2 2 g [gg] dg[g] dt …13 Values of constants used are shown in Table 1 below. Temperature(K) Temp 280 Melting Temperature of a Melta 280 Melting Temperature of b meltb 300 Melting Temperature of c meltc 320 Melting Temperature of g Meltg 340 Denaturation rate of a a 10 80000e temp e melta e temp Denaturation rate of b b 10 80000e temp e meltb e temp Denaturation rate of c c 10 80000e temp e meltc e temp Denaturation rate of g g 10 80000e temp e meltg e temp p-bond Formation Rate kb = kc = kg p-bond Breakage Rate deg p-bond breakage rate in b D 0.6 102 e(25/ 8.3110 3 0.2 1017 e(122/ 8.3110 temp) 3 temp) 0.1deg 52 p-bond breakage rate in c Dd 0.5deg p-bond breakage rate in g Dg Deg Rate of Monomer S 10-3 to 10-9 200000 Incorperation into a. Assocication Rate of Polymers The dissociation rates were calculated so as to reproduce the sigmoid melting curve of strands at a given temperature (temp). 4 strand types (a,b,c and g) were modelled, each was double the length of the previous strand. Figure 26 shows that under a very wide range of monomer concentrations, long strands actually outnumber short strands at low temperatures (280K). This is because at low temperatures, although only a few long strands dissociate, when they do, they quickly associate with dissociated shorter strands, forming complexes aab, bbc, ccg etc…, which then undergo ligation. At higher temperatures, although long strands dissociate more readily, the complexes aab, bbc, ccg are not as stable. Since the rate-limiting step in replication is ligation, a system capable of increasing the concentration of ligation reactants will increase the rate of replication. This occurs best at low temperatures. 53 Figure 26. (Top) The concentrations (y-axis) of strands at very high monomer incorporation rate (10 -3) at 280K, over time (x-axis). (Purple = a, Red = b, Green = c, Blue = g). After an initial transient explosion of 54 b followed by c, the longest strands, g, begin to increase in concentration. (Bottom) The same axis, but at low monomer incorporation rate (10-9). 55 Figure 27. As in figure S7 except that temperature = 300K. With both high and low monomer concentrations, long strands are lost. Short strands (a and b) predominate the reactor. The stochastic model confirms the findings of the ODE model, that long strands are not produced at high temperature. If this ODE model were correct, we would expect that replication of long strands would occur very effectively at low temperatures. However, the stochastic model shows that although elongation of long strands does indeed occur at low temperatures, replication does not for sequences longer than 5 nucleotides in length. The reason for this can be understood when we see that the replication mechanism in the ODE model is a set of OR-coupled autocatalytic cycles, Figure 25. As with any autocatalytic cycle, side-reactions can deplete the constituents of the cycle, if they are able to remove these constituents at a rate greater than the influx of matter into the cycle. The set of OR- coupled autocatalytic cycles although possible in principle, is seen to be entirely fictional when we consider the presence of staggered associations resulting in elongation. Figure 5 shows merely a sample of the possible staggered associations, e.g. a can associate with a to produce not aa but aas. b can associate with b to produce bbs, which can then react with aas or a, etc. In fact there is a combinatorial explosion of possible elongation reactions by staggered association causing single stranded oligomers (a,b,c,g) are removed from the OR-coupled autocatalytic replication pathway most effectively. The advantage of the discrete stochastic model is that these dynamics emerge from the underlying ‘unbiased’ chemical assumptions. The elongation reactions can be incorporated into the ODE model by adding a depletion term to each single strand, and adding in new variable e that represents the activity of elongating staggered elements. We let ke be the rate at which strands enter into the 56 elongation pathway (ke = 0.0001), and assume that depletion of single strands into the elongation pathways occurs with flux ke[x][e]. We let the elongating elements decay into inactive substances at a rate de. The modified equations for this system are shown below, and their behaviour is shown in figure 28. d[a] s [a][b] a [ab] [a][ab] a [aab] d[b]/2 ke [a][e] dt …1 …2 d[ab] [a][b] a [ab] [a][ab] a [aab] d[ab] dt …3 d[bb] kb [aab] [b]2 b [bb] d[bb]/10 dd[cc]/20 dt …4 a[aab] [a][ab] kb [aab] a [aab] d[aab]/10 dt …5 d[b] [b][a] a [ab] [b][c] b [bc] [b][bc] b [bbc] 2[b]2 2b [bb] d[b] dd[c]/2 ke [b][e] dt …6 d[bc] [b][c] b [bc] [b][bc] b [bbc] dd[bc] dt [bbc …7 d[bbc] [b][bc] kc [bbc] b [bbc] dd[bbc]/10 dt …8 d[cc] kc [bbc] [c]2 c [cc] dd[cc]/10 dg[gg]/20 dt …9 d[c] [b][c] b [bc] 2[c]2 2c [cc] dd[c] [c][g] c [cg] [c][cg] c [ccg] dg[g]/2 ke [c][e] dt …10 d[cg] [c][g] c [cg] [c][cg] c [ccg] dg[cg] dt …11 d[ccg] [c][cg] kg [ccg] c [ccg] dg[ccg]/10 dt …12 d[gg] kg [ccg] [g]2 g [gg] dg[gg]/10 dt …13 d[g] [c][g] c [cg] 2[g]2 2 g [gg] dg[g] ke [g][e] dt d[e] ke [a][e] ke [b][e] ke [c][e] ke [g][e] de dt …14 57 58 Figure 28. (Top) High monomer activity (10-3), 280K, ke = 0.0001, de = 0 Elongation predominates and depletes the replication pathway eventually. (Middle) Elongation also depletes the replication pathway at low monomer activity (10-9), ke = 0.0001, de = 0. (Bottom) Even if elongating elements decay at a rate 59 equal to the rate of decay of g oligomers, g oligomers still cannot be maintained, ke = 0.0001, de = dg. The rate of decay of e must be much higher than the rate of decay of g to allow maintenance of g. Figure 28 shows that e takes over the population of oligomers, eventually totally depleting the autocatalytic cycles. Even if decay of e is introduced, if e decays only at the same rate as g, then e still out-competes g. The rate of decay of e must be much greater than the rate of decay of g, for g to be maintained. It is not known how this could be achieved. In conclusion, the elongation pathways are actually detrimental to replication. Partial References (see main paper for other references) Zielinski, W. & Orgel, L. Autocatalytic synthesis of a tetranucleotide analogue. Nature, 327, 346-437. (1987). Zielinski, W. & Orgel, L. Oligoaminonucleoside phosphoramidates. Oligomerization of dimers of 3’-amino-3-deoxynucleotides (GC and CG) in aqueous solution. Nucleic Acids Research, 15, 1699- (1987) Zielinski, W. & Orgel, L. The template properties of triphosphraamidates having CG residues. J. Mol. Evol., 29, 281-283. (1989) Fernando, C.T. & Di Paolo, E. A Model for the Origin of Long RNA Templates. Proceedings of the Ninth International Conference of Artificial Life, Boston, USA. (2004). 60 Kanavarioti, A. Template-directed chemistry and the origins of the RNA world. Origins. Life Evol. Biosph., 24, 479-495. (1994). Kanavarioti, A. & Bemasconi, C. Computer Simulation in Template-Directed Oligonucleotide Synthesis. J. Mol. Evol., 31, 470-477. (1990). Wattis, J. & Coveney, P. The Origin of the RNA World: a kinetic model. J. Phys. Chem. B, 103: 4231-4250. (1999). Wills, P. & Kauffman, S. & Stadler, B. & Stadler, P. Selection dynamics in Autocatalytic Systems: Templates Replicating Through Binary Ligation. Bulletin of Mathematical Biology, 1, 1-26. (1998). Breivik, J. Self-Organization of Template-Replicating Polymers and the Spontaneous Rise of Genetic Information. Entropy. 3, 273-279. (2001). Griffith, S. & Goldwater, D. & Jacobson, J.M. Robotics: Self-replication from random parts. Nature 437, 638 (2005). 61 62