In Silico Nucleic Acid Elongation using Stochastic Cellular Automata

advertisement
In Silico Nucleic Acid Elongation and Replication using
Stochastic Cellular Automata
Supplementary material
Chrisantha Fernando, Johan Elf , Jon Rowe, Eörs Szathmáry
February 2007
1. Introduction
2. The stochastic model of template dynamics in the context of previous work
3. Methods
3. The Overall Algorithm
9. Formal Description of Algorithm
12. The Two Time Scale Method
16. Reaction Rules
19. Justification for Reaction Rules
27. RY Stacking Rules
29. Introducing Stacking bonds (s-bonds)
31. Validation of Melting Temperatures
32. Validation of h-bond Kinetics
33. Validation of p-bond Kinetics
34. Validation Comparing CGCG and GCGC Dynamics
43. Results
43. Different reactions were observed at high and low temperatures
44. Algorithms used to detect replication
50. Elongation is a Side-Reaction of Autocatalytic Template Replication
1
The stochastic model of template dynamics in the context of previous work
Previous models of the origin of nucleic acid replication have consisted of ordinary
differential equation models that incorporate high-level chemical assumptions
(Kanavarioti and Bemasconi, 1990; Kanavarioti, 1994; Fernando and Di Paolo, 2004)
such as direct chain growth (Wattis and Coveney, 1999), sub-exponential or parabolic
template growth rates in a closed system (Von Kiedrowski, 1993), ribozyme effects, or
postulate idealized replication mechanisms (Wills et al, 1998). Macroscopic physical
models of template replication, although ingeniously relaxing these assumptions, do not
embody the order of magnitude differences in rates between phosphoamidate-bond (pbond) and Watson-Crick base pair (h-bond) events observed in nucleic acids, and have
not yet demonstrated long template replication (Breivik, 2001; Griffith et al 2005). In
order to integrate these diverse approaches, a low-level stochastic model of the
underlying chemical kinetics of nucleotide and polymer dynamics was designed to
explore the sufficient functional conditions for non-enzymatic long template replication.
This model was applied to replicate some of the findings of Von Kiedrowski’s group
using phosphoramidate dimers.
2
Methods
The Overall Algorithm
Figure 1. A high-level diagram of the overall simulation algorithm is shown. Only a small sample of
simulated templates, and only C & G nucleotides, are depicted.
3
An informal description of the algorithm is provided, followed by a formal specification.
Figure 1 above shows the overall organisation of the simulation. The simulation used a
two-time-scale technique to deal with the order of magnitude difference in characteristic
times between (Watson-Crick base pair) h-bond events and (phosphoramidate) p-bond
events. An ‘event’ means the formation or breakage of a bond or a bond-type.
Constrained by high computational cost, a very small volume1 was modelled as a flow
reactor containing initially 500 dimers, giving an effective initial oligomer double strand
concentration of 20 mM. Using a variant of the Gillespie algorithm, the hydrogen bond
dynamics of the system were simulated until a quasi steady state was reached, e.g. for a
period of 0.0001-0.05 seconds2. Figure 3 describes the reaction rules used to calculate
propensities. Once a steady state in the fast h-bond dynamics was reached, microstates
from this steady-state were sampled at fixed time intervals of 0.05 seconds. Each
microstate had an associated propensity of p-bond formation, given by the sum of the
number of potential p-bond forming configurations contained in that microstate
multiplied by the rate of p-bond formation associated with each p-bond forming
configuration. Also, each microstate had an associated p-bond breakage propensity,
which was assumed to be simply proportional to the number of p-bonds in that
microstate, therefore this value was equal across all microstates. A time was generated at
which the next p-bond formation event was expected to occur, by entering the p-bond
formation propensity into equation 1 below…
Avogadro’s number x Volume is defined as 50000. Dividing the number of molecules by this value gives
the concentration in moles/L.
2
The rules are applied to the system using a variant of the SSA algorithm (Elf and Ehrenburg, 2004), based
on the next reaction method of the Gillespie algorithm (Gillespie, 1977; Gibson and Bruck, 2000).
1
4
ti 
ln( rand())
Pi
…. Equation 1.
Similarly, an expected time was generated for the next p-bond breakage event. The event

occurring first was executed. Roulette wheel selection was used to randomly choose a
microstate in which to execute the event, with microstate weighting determined by the pbond event propensity in that microstate. Once a microstate had been chosen, roulette
wheel selection was applied over all p-bond events of the appropriate type in that
microstate. In general, a time to the next event was always obtained by applying equation
1 to the propensity of that event. The event occurring earliest was the one chosen for
execution. The particular bond that was chosen to undergo an event of the chosen class
was obtained by percolation through a chain of roulette wheel selections, biased by
propensities at each level in the hierarchy.
5
Figure 2. Top: The representation of polymers allows a very simple secondary structure that forbids
hairpins. Therefore, any mechanism of replication requiring hairpins will not be possible in this model. Top
Left: Internal loops are possible. Top right: Breathing ends are possible. Bottom: Grid A shows a
complementary double strand of length 5. Grid B shows a complementary double strand of length 10. Grid
C shows a complementary double strand where the rightmost h-bond has been broken. Grid D shows a
single strand of length 5. Grid E shows one of the possible polymers that may result when D collides with
C. Grid F shows a polymer that may result when D collides with another single strand. This will be
unstable because the h-bond is between a non-complementary pair of nucleotides. Any polymer
6
configuration is permitted by the model, as long as two steric constraints are obeyed: nucleotides cannot
overlap each other, i.e. only a single p-bond and h-bond can occupy the same edge; one nucleotide can only
be attached to one other nucleotide by an h-bond. The stability of a newly formed polymer is an emergent
property of the strength of the stacking and h-bond forces that result immediately after association.
Figure 2 shows the representation of nucleic acids used in the simulation. A polymer is
defined as a contiguous molecule joined by h-bonds and p-bonds. Each polymer was
represented on a separate unbounded grid. Each grid represented individual A, C, G, and
T nucleotides on its vertices, hydrogen (h-bonds) on the vertical edges, and covalent
bonds (p-bonds) on the horizontal edges. Thus, a large set of polymer secondary
structures could be represented on the grid. Hairpins were excluded because p-bonds
were confined to the horizontal edges. Using this representation, firstly, a composite
propensity of reaction was calculated for each polymer, IPRP (intra-polymer reaction
propensity). The IPRP was defined as the rate of reactions of type i, multiplied by the
number of configurations that could undergo reaction type i, within that polymer at time
t, (excluding p-bond reactions). Figure 3 shows the reaction rules used to calculate the
IPRP. Secondly, a propensity of collision was calculated for all possible reactions
between polymers, PPP (polymer-polymer propensity). This was the propensity that the
next event would be a collision event rather than an intra-polymer reaction. We assumed
a well-stirred reactor, i.e. the probability of collision of all polymer pairs was equal.
Thirdly, the propensity that the next event would be a reaction between a monomer and a
polymer was determined, so defining the monomer-polymer propensity, MPP. Having
determined all composite event propensities, times were generated for each composite
event type. The event type that occurred first was chosen for execution. Roulette wheel
7
selection was used to percolate the event type decision to a particular location. If the
earliest event was an intra-polymer event then roulette wheel selection was used to
choose which particular polymer would undergo the event. Roulette wheel selection was
again applied to all possible events within that polymer, and an event was executed at
time t+x, where x was the time generated using IPRP as the propensity. If the earliest
event was a collision event then the collision algorithm was executed, whereby two
polymers where randomly chosen to undergo a collision, at two randomly chosen sites. A
collision was deemed legitimate if it did not result in overlapping nucleotides or bonds;
and if legitimate, a h-bond was formed between these two sites at time t + x, where x was
given by applying PPP to the equation 1, and the two polymers were joined to form one
polymer. In later models collisions could take place by s-bonds (stacking bonds). This
was necessary to explain the observed ability for dimers to elongate even at 30oC where
no template directed ligation is possible. If the collision was not successful, time was
updated by x, as above, but no change was made to the polymers. Similarly if the earliest
event was a monomer-polymer association, roulette wheel selection was again used to
choose a polymer onto which a monomer would attach, and again, roulette wheel
selection was used to determine a site on that polymer onto which a monomer would
attach. A monomer was attached by an h-bond at time t + x where x was given by
applying MPP to equation 1.
8
Formal Description of Algorithm
The algorithms described above are defined formally below. Algorithm 1 describes how
fast h-bond dynamics were run to a quasi steady state between each p-bond event, and
algorithm 2 describes the process shown in figure 1.
Initialization involves making N polymers and placing them into the grids. In most
experiments, 500 dimers were used as the initial set. Monomers can also be entered at
initialization, e.g. 10-20 monomers of type A, and 10 to 20 monomers of type B, and set
the simulation time, t = 0. Monomer concentration was kept constant, although in later
9
experiments the system was initialized with a number of dimers that were allowed to
deplete. Initialization also involves calculating all the reaction propensities.
Line 1: For each polymer, calculate its intra-polymer-reaction-propensity (IPRP). This is
the propensity of the event ‘some reaction occurs in polymer x’. where R1…R5 are the
propensities of reactions described by rules 1 to 5 in figure 3, that are summed over all
bonds in the polymer.
Line 4: Calculate the monomer-polymer-propensity (MPP). This is the propensity of the
event ‘one monomer attaches to a polymer’ occurs, where Av = Avogadro’s number, Q =
Volume (in litres) giving AvQ = 3.1548x104M-1 Concentrations of monomers are obtained
by dividing the number of monomers by AvQ. pA = (number of free A nucleotides on the
polymer), pB = (number of free B nucleotides on the polymer). These are summed over
all the polymers. The rate coefficient of complementary monomer attachment, c = 106
sec-1 and non-complementary monomer attachment, nc = 103 sec-1.
Line 5: Calculate the polymer-polymer-propensity (PPP). This is the propensity of the
event ‘a polymer collides with another polymer’. This is the sum over the collision
propensity of each possible polymer pair. The association propensity, c = 50000x106
[Litre M-1sec-1] is used as the association rate coefficient. Note that we calculate the
propensity of a collision event, not the propensity of successful association. If a collision
event occurs, then two polymers are chosen for collision by roulette wheel selection on
the polymer pair matrix. They are then associated at a randomly chosen pair of free hbond sites, one on each polymer. If the association is legitimate then the polymers are
joined. If the association is illegitimate, i.e. if there is overlap of nucleotides on the 2D
grid, then the polymers are returned unchanged, and the collision is deemed unsuccessful.
10
Lines 6,7,8: For each polymer (IPRP event), and for the MPP and PPP events, sample a
putative time, ti , at which that event would occur, from an exponential distribution ln(rand())/pi where parameter pi = IPRP(x), MPP and PPP respectively.
Line 9: Store the ti values so obtained, in an indexed priority queue.
Line 10: Until termination, repeat the following loop.
Line 11: Let u be the reaction whose putative time ti stored in the priority queue, is least.
Line 12: If the reaction was an IPRP event, then by using roulette wheel selection, decide
which particular event in that polymer x, should occur. Roulette wheel selection allows
the events to be sampled in proportion to their individual rates. If the reaction was a MPP
event, again use roulette wheel selection to decide onto which polymer a monomer
should attach. If the event was a PPP event, then again use roulette wheel selection to
decide which two polymers have collided.
Line 13: Change the state of the system to reflect the execution of reaction event U. This
is done as described in the reaction rules, for example, if a collision event occurs, then
run the collision algorithm to see if a legitimate collision has occurred.
Line 14: Update only the propensities that would have changed due to reaction U, by
generating a random number according to an exponential distribution using the new
propensity pi, and set tnew = -ln(rand())/pi + t, in the priority queue for that updated event.
Line 15: Reorder the priority queue so that the earliest event is at the top.
Line 16: Repeat from Line 10.
The advantage of the algorithm is that if the effect of a reaction on the other propensities
can be limited, then it is not necessary to recalculate all the propensities every time an
11
event occurs. The algorithm is rather different from the next reaction method, for instance
we do not use an explicit dependency graph.
However, even with such a method, due to the many rapid hydrogen bond formation and
breakage events, particularly those caused by polymer association and dissociation
(which occur very frequently), it is not possible to simulate more than a few seconds of
real time in a typical overnight run. One way around this is suggested by the fact that
hydrogen bond dynamics reach a pseudo-equilibrium conditioned on the current p-bond
state. Because this is so, it is only necessary to simulate the hydrogen bond dynamics
until this equilibrium has been reached. At that point we can sample the next p-bond
event.
The Two Time Scale Method
There are two very different timescales in template replication dynamics. Base pairs
(hydrogen bonds) form and break, with rates that are many orders of magnitude faster
than phosphoramidate bond formation and breakage. A technique was developed to solve
the problem efficiently. This method has general applicability in systems with clearly
distinguishable time-scales. It can be used only where it is possible to simulate the fast
dynamics in isolation from the slow dynamics, until the system reaches equilibrium.
To model the fast dynamics, p-bond formation and breakage events are ‘switched off’,
i.e. these events are removed from the priority queue. The h-bond dynamics are then run
to equilibrium conditioned on the current p-bond state. Equilibrium of fast dynamics is
said to occur when several measures of the configurations of all polymers in the
12
population, have stabilized, i.e. when the proportion of single to double strands no longer
changes, when the relative frequencies of reaction types no longer changes, and when the
average polymer mass no longer changes. If the measures that were monitored were the
average p-bond formation rates at each position, then the algorithm for separation of
time-scales would be exact. This measure is however too expensive to compute, which is
why simpler and approximate measures are used instead.
Typically equilibrium times of 0.005 seconds are required, although occasional trials
have been conducted with longer equilibrium times, to check for deviations. For very
long strands, equilibrium may take too long to reach and so inaccuracies are introduced in
this way. When equilibrium is reached, N random samples of microstates (a microstate
being the instantaneous state of all polymers) are taken. Between each of these samples,
the fast dynamics are run again, for a fixed length of time, long enough to ensure that the
samples are independent (e.g. 0.001 - 0.005 seconds). For each microstate that is
sampled, the total p-bond formation propensity and total p-bond breakage propensity is
calculated. The mean total p-bond formation and breakage propensities over all sampled
microstates are calculated. An event time is generated using, ti = -ln(rand())/pi, on the
basis of these mean total propensities for a p-bond breakage event and for a p-bond
formation event. The event with the earliest time is carried out. Serial roulette wheel
selection is used to percolate this event to the correct microstate, without having to store
all the microstates in memory. Typically, p-bond events occur with a characteristic time
of minutes or hours, whereas h-bond events occur with a characteristic time of 10-6
13
seconds. The algorithm for the two time scale method is shown below, and replaces the
contents of the main while loop of algorithm 1 above.
As N tends to infinity, the calculation of p-bond formation propensity becomes
asymptotically more accurate. If the dynamics are such that very rare microstates
contribute significantly to the p-bond formation propensity then it is clear that larger
sample sizes will be required. Different values of N and X were used to test the minimum
sample size that could be safely used. A sample size of 10 was typical with a time
between samples of 0.005 seconds. A problem with this algorithm is that if the total p14
bond formation rates are very different for different microstates, for example, if there is a
small class of microstates, e.g. 0.1 percent of all microstates that have a very high p-bond
formation rate, these can easily be missed in the current sampling scheme, although they
in reality would contribute a lot to the formed p-bonds. Another problem is that it is
difficult to ensure that N samples are really independent. For example, certain complex
polymer configurations may change much more slowly than the times that can reasonably
be simulated with h-bond dynamics running. An example is with long duplexes, which
will dissociate at low rates, but will not have sufficient time to do so given the short times
for which the rapid h-bond dynamics are run. Occasional runs have been conducted with
longer equilibrium times and similar results are observed.
15
Reaction Rules
Figure 3. The above rules define all possible reactions within polymers (rules 1 to 5) between polymers
(rule 7) and between polymers and monomers (rule 6). Each reaction is associated with a propensity, and
this is used to calculate the IPRP, PPP and MPP composite propensities used in algorithm 1.
Figure 3 above describes how the IRPR, PPP and MPP values were calculated using
realistic reaction kinetics and free energies obtained from empirical studies (Cantor and
Schimmel, 1980; Turner, 2000; Reynaldo et al, 2000; Rohatgi et al, 1996; Schoneborn et
al, 2001; SantaLucia, 1998; Xia et al, 1998). P-bond formation and breakage rates were
modelled on phosphoramidate bonds, although due to the time-scale separation between
h-bond and p-bond dynamics, the same overall behaviour is expected irrespective of the
absolute values of p-bond formation and breakage rates. The probability of bond
breakage and formation are functions of the local neighbourhood configuration of bonds,
16
monomer concentration, and temperature. The rules shown in figure 4 were applied to
each bond or potential bond site on each polymer to calculate the rate of that bond
breaking or forming and then used to calculate the composite propensities, IPRP, PPP
and MPP. Rule 1: h-bond breakage rate, dm  Am e(E a Kn )/ RT . GC single h-bond breakage
activation energy Ea = 31.0kJ. AT single h-bond breakage activation energy Ea = 29.5kJ.
 h-bonds = 18.4 kJ. This resulted in approx. 10%
Mean Ea for non-complementary
mispaired nucleotides. Modifications applied due to stacking effects are described in
figure 4. T is the temperature in Kelvin (either 275 or 300K in these experiments). R =
8.314x10-3. kJ K-1 mol-1. The central h-bond in configuration B1 contains no h-bonds
adjacent to it, and so it has the highest probability of breakage. The central h-bond in the
BN configuration has 4 adjacent h-bonds, and so has the lowest probability of breakage.
The ordering of stabilities of the central h-bond from least-to-most stable is B1 > B12 >
B2 > B23 > BN1 > BN3. The Arrhenius constants for each neighbourhood configuration
are approximated by AB1 = 3.6x1013, AB12 = 1.8x1013, AB2 = 7.3x1012, AB23 = 1.8x1012,
AB3 = 3.6x1011, ABN1 = 3.6x1010, ABN = 1.8x1010. The entropy term Kn was required to
obtain the proper melting temperature curve for long strands. K was empirically set to
0.01, n is the number of nucleotides not attached by h-bonds to another nucleotide, e.g.
those at the dangling ends. Rule 2: Catalysed h-bond formation rate = 106 sec-1 adjacent
to other h-bonds. Rule 3: Zipper h-bond formation rate = 106 sec-1 irrespective of the
distance from the potential h-bond to the closest nucleation site (a nucleation site is a
pattern of 3 adjacent h-bonds). Note that the simultaneous application of rule 2 and rule 3
result in h-bonds being twice as likely to form next to another h-bond than anywhere else
3
These are codes for the classes of equivalent h-bond neighbourhood states that contribute the same
stacking stability to the central h-bond. They label the 16 configurations shown in figure 3 rule 1.
17
along
a
duplex.
Rule
p _ form _ rate  0.6e25 RT
4:
Template
.
Rule
directed
5:
p-bond
p-bond
formation
breakage
rate
rate
=
=
rate _ p _ break  1.32 1012 e110 RT , n.b p-bond breakage rate exceeds p-bond formation


rate, above 350K. Note rules 4 and 5 are only applied in the outer loop of algorithm 2, not
in algorithm 1. Rule 6: Monomer attachment. Complementary rate = 106 sec-1. Noncomplementary rate = 103 sec-1. Stacked monomer attachment (i.e. attachment of a
monomer next to an h-bond) is 100 times these values. Rule 7: Collision rate between
any two Polymers = 5x109 sec-1 M-1. Note this is a bimolecular rate constant because it is
multiplied by two concentrations. The total collision propensity is given by is 5x10 9 x
(No. of Polymers/NAV) where NA = Avogadros number, and V is the volume. We set
NAV = 104 when the tide was out, and NAV = 105 when the tide was in. When a collision
occurred, the two random nucleotides between which a new h-bond (or transient
“stacking” s-bond, see figure 5) would form were chosen, and the polymers were
combined at that site. If the combination resulted in no overlap of nucleotides on the 2D
grid, then the collision was successful and polymers were combined, else the polymers
remained separate.
‘Rule’ X: Spontaneous p-bond formation by association of
monomers was forbidden because the rate was too low to be significant at the time-scales
considered. However, spontaneous p-bond formation between oligomers by single
stranded and double stranded (non-templated) end ligation was allowed, see figure 5.
18
Justification for Reaction Rules
Figure 3 above shows all the reaction rules that have been implemented in the model. The
reaction rules determine the propensity of reactions. The propensity of a reaction is
defined as the rate of that reaction times the number of reactants that can undergo that
reaction. From the propensity of a reaction, the overall probability of that reaction
occurring in the reactor can be calculated. The propensity of bond breakage and
formation is a function of the local neighbourhood configuration of bonds, the
monomer concentration, and temperature. The reaction rules are applied to each bond,
and to each potential-bond location, to calculate the propensity of a bond breaking or
forming respectively at that site. The relevant propensities are updated, whenever a
reaction makes or breaks a bond.
Rule 1: h-bond breakage: Watson-Crick base pairs break and form extremely rapidly,
especially at the breathing ends of a strand. This is because, due to double strand stacking
effects, it is easier to break h-bonds when they are not adjacent to other h-bonds.
Therefore, we have assumed that the probability of breaking an h-bond depends on the
neighbourhood configuration of its 4 adjacent h-bonds. Figure 1 rule 1, shows the 16
possible h-bond configurations around a central h-bond, labelled from B1 (no adjacent hbonds) to BN (4 adjacent h-bonds). The h-bond breakage rate for one h-bond is given in
units sec-1 per bond) by the Arrhenius equation. dm  Am e(E a K n / RT ) . dm is the rate of
hydrogen bond breakage for a hydrogen bond with configuration m, Am is the Arrhenius

19
constant for the configuration m. Ea is the activation energy for h-bond breakage. In early
experiments, Ea = -34.4kJ for complementary h-bonds and Ea = -29.4kJ for noncomplementary h-bonds. This means that non-complementary h-bonds have a greater
probability of breaking at any given temperature. We demonstrated experimentally that
this difference in activation energy between complementary and non-complementary
base pairs resulted in approximately 10 percent mispaired base pairs. This is the right
ballpark of mispairing frequency expected without enzymes. T is the temperature in
Kelvin, and R = 8.314 x10-3 kJ M-1 sec-1. The ordering of stabilities from least-to-most
stable is B1 > B12 > B2 > B23 > BN1 > BN. The Arrhenius constant, Am is specific for
each neighbourhood configuration, and reflects this increasing stability. A further term is
required to express the effect that the size of dangling ends has on dm. As the size of
dangling ends increases, there are more possible open configurations than closed (reannealed) configurations for the dangling end, and so re-annealing is entropically
disfavoured. The dangling-end-length dependent entropy term Kn was required to obtain a
linear melting temperature curve for long strands, see figure 4.
20
Figure 4. The effect of dangling end entropy on melting temperature. X-axis shows 1/Length, Y-axis shows
1/Tm. Circles show lengths at which Tm was measured. The Entropy term, Kn, is introduced to remedy a
deviation from the expected melting temperature relation as a function of strand length. After the correction
is introduced, strands with bigger dangling ends tend to melt more easily.
K is empirically set to 0.01, n is the number of nucleotides not attached by h-bonds to
another nucleotide, therefore it approximates the number of nucleotides at the dangling
ends. Note that the h-bond breakage rates do not depend on the state of p-bonds, i.e. a hbond surrounded by 4 h-bonds formed due to the presence of 4 unlinked monomers, has
the same probability of breaking as a h-bond surrounded by 4 h-bonds formed between
21
nucleotides linked by p-bonds. This assumes that stacking interactions are the same,
irrespective of whether nucleotides are joined by p-bonds or not. The full model refines
these assumptions as described in the paper.
If an h-bond break results in breakage of a polymer into two parts, then action is taken to
update the box representations. If the polymer breaks into two polymers, then one of the
polymers is moved to a new box. If the polymer breaks into one polymer and a monomer,
then the monomer is removed, and the number-of-monomers variable is incremented by
one.
Rule 2: Catalysed h-bond formation. This rule implements the effect of double-strand
stacking, whereby h-bonds have a greater probability of forming next to adjacent h-bonds
than at random positions in a partially annealed duplex. In the simple model one can
assume that the influence of stacking is not a function of the identity (C or G) of the
adjacent nucleotide, i.e. all dinucleotide cores are assumed to have equal stacking
energies. The NN model shows that this is not in fact the case and so in the model
described in the paper, NN and RY effects are considered. The rate of catalysed h-bond
formation = 106 sec-1 per possible h-bond. The total propensity of a catalysed h-bond
formation is this rate times the number of possible h-bonds that can form in the system.
We assume this is a temperature independent process. The shifting of hydrogen bond
formation to breakage equilibrium as a function of temperature is due to changes in the
rate of h-bond breakage.
22
Rule 3: Zipper h-bond formation. This rule implements the fact that h-bonds have a
certain chance of forming anywhere along a partially annealed duplex with rate = 106 sec1
per possible h-bond, irrespective of the distance from the potential h-bond to the closest
nucleation site (a nucleation site is a set of 3 adjacent h-bonds). The simultaneous
application of rule 2 and rule 3 results in a h-bond being twice as likely to form next to
another h-bond than anywhere else along a duplex. In reality, the probability of zipper hbond formation drops off as a function of distance from the nucleation site; however, this
happens sufficiently slowly, that for polymers of up to 100 nucleotides in length it can
safely be ignored. Again we assume this is a temperature independent process, and allow
the h-bond breakage rule to model the effect of temperature on the h-bond equilibrium.
Rule 4: Template Directed p-bond formation. p-bonds form when two nucleotides are
brought into close contact on a template strand. Such a configuration is shown in rule 4.
The rate of p-bond formation, assuming a legitimate configuration, is given by the
Arrhenius equation, Pf  0.6e(E a / RT ) , where Ea = 25kJ is the activation energy of p-bond
formation and 0.6 is the Arrhenius constant. The units of Pf are sec-1 per potential p-bond

forming configuration.
Note that the rate for p-bond formation appears high, but that in
practice, because collisions between strands often do not result in a legitimate
configuration for p-bond formation, the overall rate is much less.
Rule 5: p-bond breakage. p-bonds break as described by the Arrhenius equation,
Pb  1.32 1012 e(E a / RT ), where Ea = 110kJ is the activation energy of p-bond breakage

23
and 1.32x1012 is the Arrhenius constant. The units of Pb are sec-1 per p-bond. Figure 5
shows the rates of p-bond formation and breakage as a function of temperature.
Figure 5. p-bond breakage rate exceeds p-bond formation rate, at 350K. However to obtain the reaction
propensity for p-bond formation, the p-bond formation rate is multiplied by the number of appropriate
configurations in which p-bonds can be formed, whereas the p-bond breakage propensity is independent of
configuration. In practice, p-bond breakage propensity will exceed p-bond formation propensity at a much
lower temperature than 350K because the tri-molecular complex required for p-bond formation becomes
very unlikely at higher temperatures.
24
Rule 6: Monomer attachment. The rate of h-bond formation by monomer attachment
depends on whether the monomer is attaching to form a complementary or a noncomplementary h-bond. The complementary rate = 106 in units of sec-1 per mole of
monomer, per potential complementary attachment site. The propensity of monomer
attachment for a given potential complementary attachment site is therefore obtained by
multiplying this number by the complementary monomer concentration. The noncomplementary rate = 103, in the same units, except, the non-complementary monomers
and sites must be considered. Again, stacking interactions are simplified. A nucleotide
will attach next to another nucleotide with much greater rate than elsewhere on a
template. Stacked monomer attachment is assumed to occur at a 100x greater rate than
the values above, for both complementary and non-complementary base pairs,
irrespective of the identity of the dinucleotide core.
Rule 7: Collision and association between polymers. This rule implements a wellmixed reactor. The concentration dependent collusion rate between two polymers A and
B, is assumed to be 106 [Liter M-1sec-1] x (free binding sites on Polymer A)/(AV) x (free
binding sites on Polymer B)/(AV), where A = Avogadro’s constant, and V = volume. 106
is the unimolecular rate of formation of a h-bond between a monomer and per potential
site on a polymer, in units sec-1. The total propensity of collision between any two
polymers is calculated by summing this value over all possible collision pairs. After
collision, two potential h-bond sites are chosen at random, and the polymers are joined at
this site. However, if there is any overlap between monomers then the collision is
considered a failure, and the polymers are returned to the population unchanged. If there
25
is no overlap between monomers, then the h-bond is formed and the polymers are joined,
i.e. one polymer is moved to the grid of the other polymer and attached at the correct
place. This is a simplification of the collision process, but results in the correct order of
magnitude of association rate.
It was necessary to introduce this approximation whereby the polymer-polymer event
propensity, PPP, represents collision probability and not association probability because
otherwise calculation of PPP would be too slow because one would have to calculate all
possible collision configurations for all possible polymer-polymer pairs. Instead we
calculated the collision probability as a function of simply the number of free binding
sites, and then calculate whether an association takes place by randomly defining an
attachment site and checking whether it is legitimate. This results in a lower overall
association rate than collision rate. The collision rate is near the diffusion limited
maximum. In the model presented in the paper, collision frequency is independent of
polymer length. This makes a significant increase to the speed of simulation, allows the
correct melting temperature curves.
‘Rule’ X: Spontaneous p-bond formation. In the simple simulation, this reaction was
forbidden because it is assumed to occur at such a low rate that it can be ignored. This is
unfortunately not the case in real chemical systems, for example, where spontaneous
formation of phosphoramidate bonds results in non-templated elongation. The model now
incorporates spontaneous p-bond formation.
26
RY Stacking Rules
Figure 6 below describes how some more elaborate calculations of inter-strand and intrastrand stacking forces were implemented (SantaLucia, 1998; Cruz et al, 1982) because
this was known to influence the tendency for elongation, and contributes to ‘phenotypic’
diversity (Sinclair et al, 1984; Zielinski and Orgel, 1987a, 1987b, 1989), for example
allowing GCGC to replicate but tending to cause CGCG to elongate.
Figure 6. According to the nearest-neighbour (NN) and Pyrimidine-Purine (RY) stacking models, the local
configuration of intra- and inter-strand nucleotides influences the stability of h-bonds. To calculate the
propensity of breakage of an h-bond, one has to check which instances of the above template configurations
27
apply. The figure shows all the possible stacking rules that may be applied to the central h-bond between a
5’ G (blue square) and a 3’ C (red circle) nucleotides, labelled with a green arrow, given the appropriate
nucleotide configuration. The 5’-3’ strand is shown at the top of each duplex. To see how to apply the rules
to an h-bond between a 3’ C and a 5’G simply rotate the picture by 180 o about the plane of the page, i.e.
look at it from the back. From top left to bottom right: the GC dinucleotide core is assumed to result from
two intra-strand stacks having total energy 2 x iCG. The GG and CC cores result from two inter-strand
stacks between C and G (with total energy 2 x eCG). The CC core results from one inter-strand stack
between G & G (with energy eGG). The additional effects of stacking at longer distances are modelled by
the simultaneous application of four further rules, contributing stacking stabilization of the central h-bond
at strengths, +CGC, +CGG and +CGGC, applied according to the satisfaction of the templates above, i.e. if
the h-bond falls within the red highlighted region then that h-bond receives the stacking stabilization
associated with that region. With the inclusion of A and T nucleotides, additional configurations must be
considered. The stabilities have numerical values as follows (in kJ): iGC = 1.12, eCG = 0.92, eGG = 2.17,
iAT = 0.44, eTA = 0.5, eAA = 0.58, iGT = 0.72, iAC = 0.72, eTG = 1.28, eCA = 1.30, eAG = 1.45, CGC =
1/5, CGG = -4/5, CGGC = 2/5. A scaling factor of +1.8 was typically used to multiply all these stacking
effects, and was adjusted to allow approximation of experimentally observed T m values. The effectiveness
of these adjustments can be seen by looking at the observed vs modelled T m’s for various oligonucleotide
sequences in figure 8.
28
Introducing Stacking bonds (s-bonds)
Finally, transient ‘stacking’ bonds were introduced, capable of mediating blunt end
ligation of templates, see figure 7. This was necessary to match recent results observed
experimentally in real chemical systems of pure dimers capable of elongation despite the
fact that templated p-bond formation could not have occurred at 30oC due to the
extremely low melting temperatures of dimers.
Figure 7. Stacking s-bond formation (shown as a grey bond) occurs between blunt ends, with a certain
probability whenever two polymers collide. An s-bond mediated collision occurs if upon collision a random
number between 0 and 1 is greater than 0.3 + 0.25 x Log(Number free H-bond sites on Polymer A x
Number free H-bond sites on Polymer B) and both strands have less than 8 p-bonds each. The mean rate of
s-bond breakage has the same profile as the rate of h-bond breakage in the B3 configuration, and is adjusted
by stacking effects as follows. S-break-rate = AB3e-(Es + stack). AB3 = 3.6x*1011 is just the Arrhenius constant
29
for the intermediate h-bond breakage configuration, Es = 15.5kJ (an overestimate necessary to observe sbonds in the small number of templates simulated) and stack = 3.0 x the NN stacking energy contribution
due to application of the templates in figure 6. There is a certain rate of p-bond formation on s-bonds,
which
is
assumed
to
be
the
same
as
the
rate
of
templated
p-bond
formation,
i.e.
p _ form _ rate  0.6e25 RT . In practice the net ratio of spontaneous to template p-bond formation will
depend on the specific chemical system.

30
Validation of Melting Temperatures
To check that the above parameters produced the correct behaviour, control experiments
were conducted to calculate the melting temperature curve for AnTn and GnCn oligomers,
see figure 8.
Figure 8. Melting temperatures (Tm) for AnTn and GnCn oligomers at concentration 20mM. N = Length.
The x-axis shows 1/N and the y-axis shows 1/Tm. Circles show the points at which Tm was measured. The
outlier is a 6-mer of sequence AAGCTT, which as expected shows an intermediate T m between A6T6 and
G6C6.
31
Validation of h-bond Kinetics
To check h-bond kinetics, dissociation behaviour from double stranded to single stranded
states was measured as a function of strand length and temperature, figure 9.
Figure 9. The rate of dissociation of ds-GCGC 4-mers at different temperatures is as expected,
denaturation being more rapid at higher temperatures. Initial concentration = 10mM double-strands. Note
that GCGCG takes longer to dissociate than GCGC at the same temperature, and does not dissociate to as
great an extent.
32
Validation of p-bond Kinetics
To check p-bond kinetics, an experiment was conducted in a reactor closed to mass and
initialized with only GC dimers, at 275K, figure 10.
Figure 10. A batch reactor was initialized with 20mM GC dimers at 275K. The continuous lines show
expected rates in a simple minimal replicator model for GCGC formation (increasing curve) from GC
(decreasing curve) with the same p-bond formation kinetics (Von Kiedrowski, 1993) as in the stochastic
model. The ODE model assumes no elongation reactions beyond GCGC, whereas the stochastic model
shows such reactions are inevitable unless special blocking of ends is undertaken. Note that Zielinki and
Orgel had to block template ends in their non-enzymatic replication experiments to prevent elongation, see
supplementary material.
33
Validation Comparing CGCG and GCGC Dynamics
In the original set of experiments conducted to produce the algorithm, we had initially
used a different method of calculating polymer collision propensity. Originally we
assumed that the propensity was an increasing function of the size of polymers. However
the following erroneous behaviour was observed. GCGC templates and GC dimers were
placed into a reactor, and the system was allowed to reach equilibrium. [GC] = 0.001M
and [GCGC] was varied from 0mM to 1mM. Experiments were carried out at low
temperatures, e.g. 280 - 300K. However, with all the above parameter settings, and even
after increasing iGC to 2.8kJ, GCGC tended to be lost to elongation side reactions
resulting in long strands, see figures below.
Figure 11. Length distribution of oligomers produced in an experiment at 300K, Initial [GC] = 1mM,
Initial [GCGC] = 0.1mM. Length of polmers is shown from left to right.
34
Figure 12. Top: Representation of dimers in computer graphical form. Bottom: Initially p-bonds form in
the perfect configuration. Above is shown a 4-mer joined to 2 dimers from a simulation screenshot.
Figure 13. But eventually a staged 4-mer duplex forms, which can attach two dimers. It is for this reason
that Zielinski and Orgel had to block the tetramer ends, so that such configurations would not lead to
elongation. Once a single GCGCGC has been produced, then due to the large number of free binding sites
there is a runaway condensation effect with larger polymers acting as seeds.
This was unexpected because Zielinski and Orgel had demonstrated the replication of
GCGC tetramers, albeit with blocked ends. We modified the simulation to calculate the
probability of polymer collision independent of polymer length. This resulted in the
following correct behaviour.
35
Figure 14. GCGC length is shown increasing from left to right. However, there is an early period of growth
of GCGC, as can be more clearly seen from the screen snapshots shown in figure 5,6, and 7.
36
Figure 15. Shortly into an experiment with a reactor consisting initially of [GCGC] = 0.16 mM = 5
polymers, and [GC] = 1mM = 30 polymers, 2 complexes awaiting p-bond formation can be seen. Already,
2 new GCGC tetramers must have been formed, because we started with 5 GCGC double strands.
Temperature = 300K. The picture shows the polymers in the system. Circled in blue are the complexes. No
staggered complexes of GCGC are observed at this stage.
37
Figure 16. Later in the experiment, after 5 new GCGC tetramers have been formed, a staggered complex is
observed, marked in red with a cross.
38
Figure 17. Eventually elongation reactions predominate, forming longer and longer complexes. At this
stage a total of 22 ssGCGC tetramers exist compared to an initial value of 10 ssGCGCs. This is a doubling
of GCGC numbers.
39
Therefore autocatalytic growth of GCGC does not continue because eventually GC
duplexes are consumed. At this stage, there is a higher concentration of GCGC tetramer
and so there is a greater probability of forming staggered GCGC duplexes. Once the GC
dimers are consumed, only further elongation reactions are possible. This elongation was
avoided in Zielinski and Orgel’s experiments because they blocked the ends of the 4mers.
Zielinski and Orgel obtained negative results for the autocatalysis of CGCG tetramers
with CG dimers, using an identical protocol to the above experiment, ( Zielinski & Orgel,
1989). Our stochastic model confirms that this is also the case with CGCG with
unprotected ends, and explains why. An experiment identical to that shown in the above
figures was conducted but with CGCG tetramers at 290K. Staggered complexes were
formed right from the beginning of the trial, and all p-bond formation occurred within the
elongating complexes.
40
Figure 18. [CGCG] = 0. mM = 5 polymers, and [CG] = 1mM = 30 polymers. Temperature = 290K. There
was no replication of CGCG, rather elongation occurred. The experiment lasted 7.0 x 10 7 seconds.
There was a rapid rate of utilization of CG. No transient increase in CGCG 4-mers was
observed, because all tetramers were ligated to form longer strands straight away, before
any tetramer had the opportunity to ligate two dimers into a copy of itself. This is quite
different from the behaviour of GCGC tetramers, which although undergoing elongation
eventually as dimers are exhausted, initially showed autocatalytic growth. Comparison of
GCGC with CGCG behaviour can be made by comparing the screenshots above with
those below.
Figure 19. [CGCG] = 0. mM = 5 polymers, and [CG] = 1mM = 30 polymers. Temperature = 290K. No
replication of CGCG tetramers is observed. This is opposite to the GCGC case.
41
Figure 20. After a further time period, more elongation has occurred, with no replication of CGCG.
In conclusion, the dynamics of replication in GCGC vs. elongation in CGCG is
confirmed. Because we had no blocked ends, even GCGC showed elongation after an
initial period of replication.
42
Results
Different reactions were observed at high and low temperatures
At low temperatures (2oC) elongation reactions were predominant, whereas at higher
temperatures although elongation eventually also took place, short strand replication, and
incomplete template replication was also observed, along with a higher incidence of
strand breakage, see figure 21 below.
Figure 21. A representation of reaction mechanisms directly observed at high and low temperatures. Left:
Staggered association (A) and ligation at splint junctions (B) is the mechanism observed to produce
elongation at low temperature. Right: At high temperatures, elongation does not occur due to premature
strand separation (C), short strands out-competing long strand replication (D), and p-bond breakage (E).
43
Algorithms used to detect replication
44
Figure 22. A ‘strong’ (Top) and a ‘weak’ (Bottom) algorithm for detecting replication. The strong
definition of a replicated sequence requires that nucleotides on the ‘grandparent’ strand be contiguous and
identically ordered compared to nucleotides on the ‘child’ strand. The historical path shown by the yellow
arrows can be traversed in order to determine the length of `strongly replicated’ sections. The weak
definition of a replicated sequence allows multiple ‘grand-parent’ strands to contribute matter to the ‘child’
strand, so that identical motifs present in high copy number can contribute oligomers to form the ‘child’
strand. Whenever a p-bond forms, a new `motif’ data structure is created (a motif is defined as the
overlapping sequence of nucleotides between copy and template ‘responsible’ for the p-bond formation
event). Also, at each p-bond formation event, any motifs (or motif fragments) present in the template are
passed to the copy. After an experiment we obtain a store of how many motifs and motif fragments were
formed and copied. If many copies of a motif of length N exist, then for large N, this is good evidence that
N is being replicated.
Two algorithms were used to detect replicated sequences. These are described in figure
22. Results from the first algorithm are shown below. Since the phylogenetic histories of
all strands are known, it is possible to trace the ancestry of each p-bond, i.e. to know the
identity of the ‘parent’ p-bond and the ‘grandparent’ p-bond on which any p-bond was
template replicated (see figure 22a). An algorithm compares the contiguity and order of
p-bonds on the ‘child’ strand with that of the ‘parent’ and ‘grandparent’ strands. If these
are preserved between adjacent p-bonds on all three polymers, then sequence replication
is defined as having occurred in that section. This algorithm is oblivious to mis-pairings.
The algorithm is described below.
45
Algorithm to Check for Replication.
For each Polymer
For the p-bonds on the polymer from left to right
If the p-bond on the right of this p-bonds grandparent is the same as the
grandparent of the p-bond on the right of this p-bond, then this is a replicated
sequence of 3 nucleotides.
Store the identity of the child and grandparent nucleotides.
Else this is not a replicated sequence.
End For
End For.
In a typical run initialized with random G and C dimers only the following sequences
were replicated, 101, 000, 011 , 011 111, 0100 i.e. there existed examples only of 3 and
4 contiguously replicated nucleotides, and one case of 2 such sequences on one polymer.
Direct observation shows that it is common that strands consist of small sections of
sequences copied from many parents, and even more grandparents. Two examples are
given. Figure 23 shows the phylogeny of a short strand with sequence 0110011100.
Many different templates are used to make the strand. The strand grows by addition of
monomers, not through ligation of oligomers (except for the ligation 00 + 01100111).
Figure 24 shows the immediate phylogenetic tree of a slightly longer strand that has been
constructed by ligation of two oligomers. 6 p-bonds in the final strand were templated on
the strand 000110100101 shown in bold. The complementary sequence of this strand is
present in the copy (with one error). Again, this does not demonstrate a full replication
cycle since a further round of replication is required to construct the original sequence.
The absence of replication of sequences longer than 3 suggests that the intermediate
46
strand (parent) is not able to undergo this second phase of replication. This is because it
becomes incorporated into an elongating polymer, unable to itself undergo replication
apart from at the breathing ends. Elongation continually displaces the sequence from the
breathing end, preventing its continued replication.
47
Figure 23 The strand is formed by the ligation reactions shown. Starting at the bottom and moving up, each
reaction is labelled with the sequence of the template used to catalyse the template directed p-bond. A star
(*) on this labelling template indicates the particular p-bond on which the new p-bond was formed. The
sequence 0110011100 is produced not by ligation of oligomers but by the template-directed incorporation
of monomers onto the ends of a growing oligomer. The bold labelled template represents the same template
molecule, whereas the first 3 templates and the last template are different molecules. This suggests that
1100, 11001, 011001, 0110011 and 01100111 grew on 11111011000 without being detached. n.b. this
phylogentic tree is not sufficient to demonstrate replication since we do not know whether 11111011000
was itself a faithful copy of a grandparent. To do this, we would have to observe the history of each
labelled template also.
48
Figure 24. The copied strand is complementary to the parent, except for one mis-pair, and an additional
sequence to the left. Such additional sequences shield the central sequence from replication, removing it
from the breathing ends of the strand.
49
Elongation is a Side-Reaction of Autocatalytic Template Replication
Figure 25. Three OR-coupled (Gánti, 2003) autocatalytic cycles potentially capable of mediating template
replication are shown in solid arrows. Side-reactions are partially due to p-bond breakage (thin dashed
arrows), but are primarily due to elongation. Only a sample of the possible elongation reactions is shown
(thick dashed arrows). The leftmost cycle must turn 4 times to supply the matter for one turn of the
rightmost cycle.
50
Figure 25 shows a hypothesised mechanism for template replication in which sequential
ligation of oligomers results in replication of long strands. The shortest oligomer, a, is
assumed to be produced at a constant rate by monomers. Oligomer aa dissociates and two
copies of a re-anneal upon b, producing aab which is ligated to produce bb.
bb
occasionally denatures to b, and two copies of b re-nature upon c forming bbc, which is
ligated to cc, and so on. The following differential equations describe the system.
d[a]
 s  [a][b]   a [ab]  [a][ab]   a [aab]  d[b]/2
dt
…1
…2

d[ab]
 [a][b]   a [ab]  [a][ab]   a [aab]  d[ab]
dt
…3

d[bb]
 kb [aab]  [b]2   b [bb]  d[bb]/10  dd[cc]/20
dt
…4

a[aab]
 [a][ab]  kb [aab]  a [aab]  d[aab]/10
dt
…5

d[b]
 [b][a]   a [ab]  [b][c]  b [bc]  [b][bc]   b [bbc]  2[b]2  2b [bb]  d[b]  dd[c]/2
dt
…6

d[bc]
 [b][c]   b [bc]  [b][bc]   b [bbc]  dd[bc]
dt
[bbc
…7

d[bbc]
 [b][bc]  kc [bbc]  b [bbc]  dd[bbc]/10
dt
…8

d[cc]
 kc [bbc]  [c]2   c [cc]  dd[cc]/10  dg[gg]/20
dt
…9

d[c]
 [b][c]   b [bc]  2[c]2  2c [cc]  dd[c]  [c][g]   c [cg]  [c][cg]  c [ccg]  dg[g]/2
dt
…10

d[cg]
 [c][g]   c [cg]  [c][cg]   c [ccg]  dg[cg]
dt
…11

d[ccg]
 [c][cg]  kg [ccg]   c [ccg]  dg[ccg]/10
dt

51

d[gg]
 kg [ccg]  [g]2   g [gg]  dg[gg]/10
dt
…12
d[g]
 [c][g]   c [cg]  2[g]2  2 g [gg]  dg[g]
dt
…13

Values of constants used are shown in Table 1 below.
Temperature(K)
Temp
280
Melting Temperature of a
Melta
280
Melting Temperature of b
meltb
300
Melting Temperature of c
meltc
320
Melting Temperature of g
Meltg
340
Denaturation rate of a
a
10 
80000e temp
e melta  e temp
Denaturation rate of b

b
10 
80000e temp
e meltb  e temp
Denaturation rate of c

c
10 
80000e temp
e meltc  e temp
Denaturation rate of g

g
10 
80000e temp
e meltg  e temp
p-bond Formation Rate

kb = kc = kg
p-bond Breakage Rate
deg
p-bond breakage rate in b
D






0.6 102 e(25/ 8.3110
3
0.2 1017 e(122/ 8.3110
temp)
3
temp)
0.1deg
52
p-bond breakage rate in c
Dd
0.5deg
p-bond breakage rate in g
Dg
Deg
Rate of Monomer
S
10-3 to 10-9

200000
Incorperation into a.
Assocication Rate of
Polymers

The dissociation rates were calculated so as to reproduce the sigmoid melting curve of
strands at a given temperature (temp). 4 strand types (a,b,c and g) were modelled, each
was double the length of the previous strand. Figure 26 shows that under a very wide
range of monomer concentrations, long strands actually outnumber short strands at low
temperatures (280K). This is because at low temperatures, although only a few long
strands dissociate, when they do, they quickly associate with dissociated shorter strands,
forming complexes aab, bbc, ccg etc…, which then undergo ligation. At higher
temperatures, although long strands dissociate more readily, the complexes aab, bbc, ccg
are not as stable. Since the rate-limiting step in replication is ligation, a system capable of
increasing the concentration of ligation reactants will increase the rate of replication. This
occurs best at low temperatures.
53
Figure 26. (Top) The concentrations (y-axis) of strands at very high monomer incorporation rate (10 -3) at
280K, over time (x-axis). (Purple = a, Red = b, Green = c, Blue = g). After an initial transient explosion of
54
b followed by c, the longest strands, g, begin to increase in concentration. (Bottom) The same axis, but at
low monomer incorporation rate (10-9).
55
Figure 27. As in figure S7 except that temperature = 300K. With both high and low monomer
concentrations, long strands are lost. Short strands (a and b) predominate the reactor.
The stochastic model confirms the findings of the ODE model, that long strands are not
produced at high temperature. If this ODE model were correct, we would expect that
replication of long strands would occur very effectively at low temperatures. However,
the stochastic model shows that although elongation of long strands does indeed occur at
low temperatures, replication does not for sequences longer than 5 nucleotides in length.
The reason for this can be understood when we see that the replication mechanism in the
ODE model is a set of OR-coupled autocatalytic cycles, Figure 25.
As with any
autocatalytic cycle, side-reactions can deplete the constituents of the cycle, if they are
able to remove these constituents at a rate greater than the influx of matter into the cycle.
The set of OR- coupled autocatalytic cycles although possible in principle, is seen to be
entirely fictional when we consider the presence of staggered associations resulting in
elongation. Figure 5 shows merely a sample of the possible staggered associations, e.g. a
can associate with a to produce not aa but aas. b can associate with b to produce bbs,
which can then react with aas or a, etc. In fact there is a combinatorial explosion of
possible elongation reactions by staggered association causing single stranded oligomers
(a,b,c,g) are removed from the OR-coupled autocatalytic replication pathway most
effectively. The advantage of the discrete stochastic model is that these dynamics emerge
from the underlying ‘unbiased’ chemical assumptions.
The elongation reactions can be incorporated into the ODE model by adding a depletion
term to each single strand, and adding in new variable e that represents the activity of
elongating staggered elements. We let ke be the rate at which strands enter into the
56
elongation pathway (ke = 0.0001), and assume that depletion of single strands into the
elongation pathways occurs with flux ke[x][e]. We let the elongating elements decay into
inactive substances at a rate de. The modified equations for this system are shown below,
and their behaviour is shown in figure 28.
d[a]
 s  [a][b]   a [ab]  [a][ab]   a [aab]  d[b]/2  ke [a][e]
dt
…1
…2

d[ab]
 [a][b]   a [ab]  [a][ab]   a [aab]  d[ab]
dt
…3

d[bb]
 kb [aab]  [b]2   b [bb]  d[bb]/10  dd[cc]/20
dt
…4

a[aab]
 [a][ab]  kb [aab]  a [aab]  d[aab]/10
dt
…5

d[b]
 [b][a]   a [ab]  [b][c]  b [bc]  [b][bc]   b [bbc]  2[b]2  2b [bb]  d[b]  dd[c]/2  ke [b][e]
dt
…6

d[bc]
 [b][c]   b [bc]  [b][bc]   b [bbc]  dd[bc]
dt
[bbc
…7

d[bbc]
 [b][bc]  kc [bbc]  b [bbc]  dd[bbc]/10
dt
…8

d[cc]
 kc [bbc]  [c]2   c [cc]  dd[cc]/10  dg[gg]/20
dt
…9

d[c]
 [b][c]   b [bc]  2[c]2  2c [cc]  dd[c]  [c][g]   c [cg]  [c][cg]  c [ccg]  dg[g]/2  ke [c][e]
dt
…10

d[cg]
 [c][g]   c [cg]  [c][cg]   c [ccg]  dg[cg]
dt
…11

d[ccg]
 [c][cg]  kg [ccg]   c [ccg]  dg[ccg]/10
dt
…12

d[gg]
 kg [ccg]  [g]2   g [gg]  dg[gg]/10
dt
…13

d[g]
 [c][g]   c [cg]  2[g]2  2 g [gg]  dg[g]  ke [g][e]
dt

d[e]
 ke [a][e]  ke [b][e]  ke [c][e]  ke [g][e]  de
dt
…14

57
58
Figure 28. (Top) High monomer activity (10-3), 280K, ke = 0.0001, de = 0 Elongation predominates and
depletes the replication pathway eventually. (Middle) Elongation also depletes the replication pathway at
low monomer activity (10-9), ke = 0.0001, de = 0.
(Bottom) Even if elongating elements decay at a rate
59
equal to the rate of decay of g oligomers, g oligomers still cannot be maintained, ke = 0.0001, de = dg. The
rate of decay of e must be much higher than the rate of decay of g to allow maintenance of g.
Figure 28 shows that e takes over the population of oligomers, eventually totally
depleting the autocatalytic cycles. Even if decay of e is introduced, if e decays only at the
same rate as g, then e still out-competes g. The rate of decay of e must be much greater
than the rate of decay of g, for g to be maintained. It is not known how this could be
achieved. In conclusion, the elongation pathways are actually detrimental to replication.
Partial References (see main paper for other references)
Zielinski, W. & Orgel, L. Autocatalytic synthesis of a tetranucleotide analogue.
Nature, 327, 346-437. (1987).
Zielinski, W. & Orgel, L. Oligoaminonucleoside phosphoramidates. Oligomerization
of dimers of 3’-amino-3-deoxynucleotides (GC and CG) in aqueous solution. Nucleic
Acids Research, 15, 1699- (1987)
Zielinski, W. & Orgel, L. The template properties of triphosphraamidates having CG
residues. J. Mol. Evol., 29, 281-283. (1989)
Fernando, C.T. & Di Paolo, E. A Model for the Origin of Long RNA Templates.
Proceedings of the Ninth International Conference of Artificial Life, Boston, USA.
(2004).
60
Kanavarioti, A. Template-directed chemistry and the origins of the RNA world.
Origins. Life Evol. Biosph., 24, 479-495. (1994).
Kanavarioti, A. & Bemasconi, C. Computer Simulation in Template-Directed
Oligonucleotide Synthesis. J. Mol. Evol., 31, 470-477. (1990).
Wattis, J. & Coveney, P. The Origin of the RNA World: a kinetic model. J. Phys.
Chem. B, 103: 4231-4250. (1999).
Wills, P. & Kauffman, S. & Stadler, B. & Stadler, P. Selection dynamics in
Autocatalytic Systems: Templates Replicating Through Binary Ligation. Bulletin of
Mathematical Biology, 1, 1-26. (1998).
Breivik, J. Self-Organization of Template-Replicating Polymers and the Spontaneous
Rise of Genetic Information. Entropy. 3, 273-279. (2001).
Griffith, S. & Goldwater, D. & Jacobson, J.M. Robotics: Self-replication from
random parts. Nature 437, 638 (2005).
61
62
Download