Stochastic modelling of memory effects on the Hunchback gene activation

advertisement
Stochastic modelling of memory effects on the Hunchback gene activation
in the fruit fly embryo
Sigbjørn Bore∗
UPMC
(Dated: December 5, 2013)
In this report a possible memory mechanism in gene activation during the early
development of the fruit fly embryo is analysed. This is done by proposing a simple
stochastic model and simulating using the Gillespie algorithm. The results indicate
that there are statistical differences between models which do and do not incorporate
memory in the embryonic development.
I.
INFORMATION ABOUT THE
Teresa Ferraro.
INSTITUTION AND THE GROUP
The work has been carried out at the Institute Curie in collaboration between the
group of Maxime Dahan and the group of
Natalie Dostatni. The Curie Institute is
located in Rue de Pierre et Marie Curie
and at Orsay. The institute is principally
devoted to research on cancer through
medicine, biology and biophysics. In addition, many groups at the Curie Institute are working on more fundamental research. The AXOMORPH work group
(funded by the ANR) is a research collaboration which focuses on the dynamic and
quantitative understanding of axis formation in Drosophila. The group includes
biologists (Nathalie Dostatni, UMR 218)
who gather data and physicists (Mathieu
Coppey, UMR 168, Teresa Ferraro, UMR
168 and ENS Paris, and Aleksandra Walczak, ENS Paris) who analyse the data.
The supervisor for this internship has been
∗
Also at NTNU; sigbjorn.bore@curie.fr
II.
INTRODUCTION
The embryogenesis of drosophila
melanogaster starts with the entry of
a sperm cell into an egg cell. The egg
nucleus and the sperm cell then fuse to
form a new cell called the zygote, which
shares half of the DNA of the father
and the mother. This cell is pictured
in Figure 1A. In the zygote, the initial
fused nucleus undergoes rapid mitosis
(cell division) forming sequentially 2, 4,
8 nuclei up to 8000 nuclei after the 13th
nuclear division. All these nuclei share
the same cytoplasm–such an embryo is
called a syncytium. The timing during
the first 2 hours of the early development
is referenced by the 14 nuclear cycles. At
cycle 7, the nuclei start migrating towards
the plasma membrane at the cortex of the
embryo (the shell). Before this, the nuclei
have not decided what to cell type to
become (no differentiation). This process
2
only starts when the nuclei have reached
the cortex. In the classical picture of
the ”French flag” proposed by Wolpert
[3], nuclei decide their fate by measuring local concentrations of morphogens.
Such morphogens are proteins that are
distributed as gradients throughout the
embryo. Given these gradients, the cells
can get a positional information regarding
the axis of the embryo (dorso-ventral
and anterior-posterior). Cells can tell
where they are and what to become (this
process is called patterning). One of the
most well characterised morphogens is
Bicoid. Bicoid mRNA is anchored in the
anterior pole of the egg by the mother
during oogenesis. After fertilisation, the
bicoid mRNA start being translated into
proteins. The Bicoid proteins are free
to diffuse and forms the pattern shown
in Figure 1B. This pattern is very well
approximated by an exponential derived
in Appendix A. At cycle 8 the nuclei
start zygotic transcriptions (production
of non-maternal proteins). In the case of
Bicoid, cells respond in a threshold manner. The most exemplar target of Bicoid
is the zygotic gene Hunchback. Over a
certain concentration, Bicoid activates
the production of Hunchback, and under,
it does not. As a result, the distribution
of Hunchback is a steep gradient which
divide the anterior and posterior side of
the embryo as depicted in Figure 1C. At
the anterior side (the side of the embryo
that will eventually will become the head)
Hunchback is expressed (Hunchback is
present) and at the posterior side (the
opposite side) it is expressed very little.
This sharp divide in spatial expression
is crucial for the future formation of the
head structures of the fruit fly.
A
C
B
Anterior
Posterior
FIG. 1. (A) The stages of embryonic development
of Drosophila. (B) Pictorial representation of the Bicoid gradient within the embryo. (C) Distribution of
Hunchback protein at cycle 12. Black means high concentration of Hunchback. There is Hunchback at the
posterior side, but this caused by other morphogens
than Bicoid.
To understand some of this behaviour
we need to go into how gene regulation
works. What follows is by no means the
whole story, but merely what is necessary
to do a simple physical model. Each nucleus of the embryo can be thought as a
chemical compartment which has its own
DNA and amount of morphogen. The
DNA encodes for the information needed
to produce proteins. The process by which
this happens is called gene expression.
The important elements in gene expression
are depicted in Figure 2 A, B and C. The
genes on the DNA are normally preceded
by a regulatory region called the promoter
region. When another compound RNA
polymerase binds to this region it causes
production of mRNA that corresponds to
this coding region. This in turn is translated into a protein (gene product). The
probability of RNA polymerase binding is
dependent on transcription factors (such
3
as Bicoid). The presence of transcription
factors changes the binding probability of
RNA polymerase. If the transcription factor is an activator, it increases the probability of the binding of RNA polymerase.
If it is a repressor, it decreases the probability of the RNA polymerase binding. An
example of this is the Bicoid–Hunchback
system, where Bicoid acts as an activator
and controls the production of Hunchback
mRNA. This report will mostly be concerned with the first step of gene activation–binding and unbinding of the transcription factor Bicoid.
A
Gene X
Promoter
B
Protein X
Translation
mRNA
RNA polymerase
Transcription
Gene X
C
X X
X X
X
X
Activators
Y Y Y
Y
Binding Site
Increased transcription
Gene X
FIG. 2. (A) Depiction of important parts of the DNA.
(B) The steps of gene expression. RNA polymerase
binds to the promoter and transcribes coding region
into mRNA. The mRNA is then translated into protein
X. (C) Transcription factor Y (activator) may bind
to the binding site. The binding results in a higher
probability of the binding of mRNA polymerase, and
thereby increasing the production of X.
Gene expression is dependent on binding, unbinding and multiple chemical reactions. Processes involving many molecules
and fast reactions are adequately de-
scribed by deterministic differential equations. However in cells this is often not
the case. In the case of Bicoid, recent
measurements [7] suggest that they are of
the order of 700 molecules in each nucleus
at the ”on–off” border of Hunchback in
the middle of the embryo. Expression of
Hunchback is suspected to be distributed
in long bursts (it is not yet known, as until
now one has only had access to still images and not movies of gene expression).
Situations like this calls for a stochastic
model. In this report we limit ourselves to
only looking at the binding and unbinding of the morphogen (step 1). Thereby
only checking if production of Hunchback
mRNA is active or not. The activation
of Hunchback can be modelled as a simple telegraph process. In a telegraph process the state of the system is described
by the two states on and off. The rates
at which the system goes from the off to
the on state is given by kon and the opposite by koff . When Bicoid is bound,
Hunchback is produced and we say that
the gene is on. When Bicoid is not bound,
Hunchback is not produced and we say the
gene is off (for analytical results read appendix B). In most cases there are not only
one binding site at the DNA, but several.
This is the case with Bicoid where experiments seems to indicate about six binding
sites. The way these binding sites work
together is called the cooperativity and is
essential for the precise patterns of gene
expression. One says that if the binding of one morphogen protein increases
the probability of the binding of the next
morphogen proteins, the morphogens acts
with positive cooperativity and for the op-
4
posite, negative cooperativity. From this
kind of behaviour one ends up with step
like responses as function of concentration
called Hill functions. The more step-like
they are, the higher Hill coefficient and the
higher degree of cooperativity. This gives
a threshold behaviour typical for biological
systems.
The simplest model for the activation of
Hunchback assumes that the rate at which
Hunchback is activated is proportional to
the concentration of Bicoid. This corresponds to
kon → kon · [bcd] .
(1)
On the contrary, koff is related to particles
knocking off Bicoid from the binding site
(thermal fluctuations) and is assumed to
be independent of Bicoid.
Cells are thought to average the concentrations in time in order to factor out
the noise to give out precise expression.
This noise is coming from the low particle
number and the inherent stochasticity in
elementary chemical reactions. We know
from experiments looking at the boundary
that in order for the cells have this precise
boundary they need to know how many Bicoid there are within the nuclei to a precision of 10%. This would mean that a
promoter in a nucleus (a single molecule!)
is able to distinguish 700 molecules from
770 molecules in few minutes (a nuclear
cycle takes about 10 minutes). It is believed that the cells use a time averaging
mechanism to achieve this precision. The
physical limit of the time needed was cal-
culated in [1] using
1
δc
∼√
c
DacT
(2)
where D is the diffusion constant, a is the
size of the promoter region, c is the concentration of Bicoid and T is the time.
The time required was calculated to to
be around 25 minutes. This time is
too long for the boundary to be established before the end of the earlier nuclear cycles, especially if one considers the
new data using a MS2-system which is
able to show movies of the production
Hunchback-mRNA. These movies indicate
a synchronous and precise expression few
minutes after mitosis. This problem has
been one of the main focuses in the AXOMORPH group. It has been speculated
by this group and other researchers that
nuclei may memorise their ancestor states
(if the mother nucleus was on or off before mitosis) and change the probability
of being active for the next generation to
be on if the mother was on. This would
mean that at each cycle, nuclei don’t necessarily have to do a new average of morphogen concentration in order to yield a
precise expression. There are many possible ways the cells can achieve this. One
possibility is that the two daughter nuclei has the same status as the mother nucleus. Another possibility is that the rate
of being on gets higher for the next generation if the mother nucleus has been on.
As the daughter nuclei that originate from
the same mother form clusters [1], it is expected for these clusters to behave similarly. One might expect that this has ef-
5
fects on the the shape and positioning of
the boundary.
After getting familiar with the field, it
was thought to be most fruitful to focus
on the establishment of the Hunchback
boundary. What we would like to study
was how incorporation of memory affects
the boundary, considering shape and positioning. Would the boundary move differently with memory from cycle to cycle
than without memory? Is the boundary
longer and more convoluted with or without memory? To be able to answer some
of these questions, routines for stochastic
simulation were developed. The aim of
the simulations is to simulate the activation and deactivation of the nuclei in the
embryo in 2-D and study the pattern of
active nuclei given a distribution of morphogen, with the aim of checking whether
or not there are statistical differences between memory and non-memory models.
III.
MODEL AND NUMERICAL
PROCEDURE
To study the effect of memory a MATLAB program for stochastic simulation using the Gillespie algorithm was written
(see appendix C and E). The idea of the
simulation is to mimic the activation of
Hunchback on a grid of nuclei across cycles where the position and ancestry is determined by experimental data (see Figure 3). The algorithm consists of: 1) Run
a simulation of the telegraph process on
the ensemble of cells. 2) Check at the end
of the cycle the status of each cell and
then analyse the pattern with and with-
out memory.
0.3
0.2
0.1
0
0
0.2
0.4
0.6
0.8
1
Position anterior posterior
FIG. 3. Experimental data of the positioning of the
cells. The cells of same color corresponds to cells which
originate from the same cell.
To carry out a simulation as described
above, the Gillespie algorithm was used.
The Gillespie algorithm was proposed by
Daniel Gillespie as a means of simulating chemical reaction networks described
by master equations. A key feature of
the Gillespie algorithm is that it is exact.
Given a master equation the algorithm
produces statistically the correct time evolution. This is a consequence of its derivation from the master equation which involves no approximations. The telegraph
process for a system of cells can be described by network of reactions (see appendix C).
From experiments we know that the activation of Hunchback follows a sigmoidal
Hill function with a Hill coefficient of 5.
To get this kind of response it is not adequate to assume that the on rate is linearly
dependent on Bicoid. What one ought
to do is to develop a model with multiple binding sites and cooperativity. Nevertheless, how cooperativity in gene expression works is poorly known. There are sophisticated models like the MWC models
[6] that can achieve cooperativity. However at the cost of introducing at least six
parameters which are not experimentally
6
known instead of two parameters (kon and
koff ), which is not desirable. However there
is a simple way of generating a pattern of
gene expression that is similar to what we
observe in experiments without modelling
the cooperativity. This is done by assuming a Hill function for the Hunchback response to Bicoid:
kon →
kon [bcd]h
h
h
[bcd] + (K)
,
(3)
where h is the Hill coefficient and K is constant of related to position of the boundary. This relation gives a sigmoidal response to Bicoid.
The way of modelling the memory
mechanism is somewhat arbitrary, as there
are many different ways of doing it. In this
report we consider only the state of the
mother cell at the end of the cell cycle. If
the mother was on the rate of being on, it
is changed as follows:
kon{i} → kon α + kon{i}
(4)
where α is positive and corresponds to the
degree of memory. The rate is thus the
sum of the rate of being active caused by
Bicoid–activation plus a memory constant
from the mother being active. Note that
in the simulation nuclei can only have one
alpha constant added.
An important part of the numerical procedure is how the boundary is characterised and what types of algorithms are
used. A description is found in Appendix
D.
IV.
RESULTS AND ANALYSIS
Before doing any analysis of the border
it needs to be established that the numerical routines works. A first check that the
Gillespie algorithm works is to compare
the ensemble average using the Gillespie
algorithm to the theoretical steady state
solution in Appendix A. As is seen in Figure 4, these curves overlap, indicating that
the algorithm has been implemented correctly. Not only the Gillespie algorithm
needs to work correctly–it is essential for
the analysis of the boundary that the algorithms used for this yields sensible results.
The tracking of the boundary must to reflect how the boundary is shaped. Figure
4 B and C shows an example of how the
tracking works in cycle 13. The boundary
marked in green is placed in a position that
corresponds to the boundary. The boundary tracked in blue reflects transition from
going to many active nuclei to a few (see
Appendix D). It should be be kept in mind
that the algorithm for tracking the boundary does not work optimally. Often the
boundary is highly distorted and it is very
hard to define a clear transition.
Having established that the algorithms
work to some degree, we are now ready to
statistically analyse the difference between
the cases with and without memory, starting off by looking at the ensemble average
of activity at the end of the cycles in each
case. As seen in Figure 5, at cycle 10 the
curves overlap since no memory has been
introduced yet. At cycle 11 the curves
start deviating. The curve with memory
start to move towards the posterior side.
This is likely to be caused by two factors.
7
A
B
1
SimulatedSpath
SteadySstateSsolution
Average activity of bin
0.8
0.7
0.6
0.5
0.4
0.3
0.2
C
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.1
0
1
0.9
Probability of being active
0.9
0
0.2
0.4
0.6
0.8
Position anterior posterior
1
0
0
0.2
0.4
0.6
0.8
Anterior Posterior
1
0.3
0.2
0.1
0
0
0.2
0.4
0.6
Anterior Posterior
0.8
1
FIG. 4. (A) Graph over simulated ensemble average
compared to the theoretical steady state solution. (B)
The blue line shows a smoothed line of average activity
of histogram and the green line show the computed
middle position. (C) The picture shows the activity
at the end of cycle 13 (red are active and black are
inactive nuclei), the blue line is the tracked boundary
between the high rate of expression and low rate of
expression.
changes the Hill coefficient. The new Hill
coefficient was computed using a fitting
function. As seen in Figure 6 the fitted
Hill function overlaps perfectly. However,
the Hill coefficient produced by memory is
only 10% higher than the Hill coefficient
used. This indicates that memory can increase the precision, but the effect is not
too strong.
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
0.5
1
0
0
0.5
Cycle 10
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
0.5
1
Cycle 11
1
0
0
0.5
1
At the border, nuclei still have a probaCycle 12
Cycle 13
bility of being active. Thus nuclei that
were active by chance at the end of the cy- FIG. 5. Average activity at cycle 10–13. Blue line is
without memory and green line is with memory.
cle will get daughters with a high on rate
pushing the border towards the posterior
side. Additionally, the nuclei move during mitosis and can cause movement of the
border. Another interesting feature is that
the green curve seems a bit steeper than
the blue one. One may thus expect that
memory can contribute to precision of the
boundary. This would then be reflected
in a higher Hill coefficient. The previous
Position Anterior-Posterior
simulation was done with a Hill coefficient
of 5, which is high (so high that the effect FIG. 6. Average activity at cycle 13 fitted with Hill
of memory may be hindered). The ques- function.
tion then is whether memory could have
The previous results have indicated that
a significant effect on the precision. To
check this, a simulation with a Hill coeffi- the border moves and that precision incient of 3 was done to see if the memory creases. It is also interesting to look at
1
Ens. avg. mem, h=3
Fitted, h=3.2835
0.9
0.8
Activity
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
0.8
1
8
how the width and position of the boundary is distributed. This can be done with
histograms. We looked at the boundary
position by measuring how much the midpoint varies from simulation to simulation.
A boundary that varies much in position is
not good for the embryo. If memory would
increase the variability of the boundary
then it might not be a good hypothesis.
The width is also an indication of how
precise the boundary is. In Figure 7A
a histogram of the boundary position is
graphed. The histograms show that the
average boundary position moves towards
the posterior by 5%. The variance is 10%
for the non memory case. Figure 7B shows
the width of the boundary. The average
width of the boundary is 6% (in embryo
length) longer for non memory than memory. Notice there is a secondary smaller
peak in the distribution of Figure 7B distribution. This is caused by the failure
of the criteria for boundary determination
in the case of high noise. The algorithm
checks when the activity goes under a specific value. In the anterior part there are
few nuclei and thereby few nuclei in the
bin. These bins are very susceptible to
noise and by chance then the width gets
overestimated.
It has been speculated that the form
of the boundary might change with memory. In [1] it was observed that the pattern
got more and more convoluted across cell
cycles and that this could be caused by
memory. To check if there are differences
in the length of the boundary with and
without memory, thousand simulations of
the boundary was done with and without
memory. A measure of how straight the
A
B
FIG. 7. (A) Histogram over the boundary position
with and without memory.(B) Histogram over the
boundary width with and without memory.
boundary is, is the total distance of the
boundary divided by the distance between
the top and bottom points. A boundary of length 1 indicates a very straight
boundary and a higher number indicates
a boundary which is not straight. Figure 8A shows a histogram of the total distance divided by the distance between top
and bottom point. Unlike some of the predictions in [1] the boundary with memory
is in average shorter than the boundary
without memory. the boundary without
memory is 4% longer than the boundary
with memory. An interesting artifact is
the high peak that appears in the memory
case. This peak probably appears because
a particular boundary has the tendency
of being repeated perfectly, and memory
helps to facilitate this. It is also possible
that this is caused by a bug.
9
B
FIG. 8. (A) Histogram of boundary length with and
without memory.(B) Histogram of the average step
length in x direction.
We expect the clusters of clones to behave similarly, meaning that their states
should be much the same. This is expected
since they are placed close to each other
and are thus spatially correlated. However, near the border the small differences
in position coupled with an even probability of being on and off should lead to a
lower degree of correlation. In the case of
A
B
1
Degree of correlation
A
memory one might expect the nuclei to be
correlated even at the boundary. If nuclei
being on have status 1 and nuclei being
off have status -1 then a good measure of
how well correlated the clusters are can be
given by absolute value of average state of
the cluster. Meaning if a cluster has correlation 1, all nuclei have the same state, and
correlation 0 means that they are poorly
correlated. Figure 9A shows a graph like
this of 10000 simulations on the experimental data on positioning. The two cases
shows qualitatively the same behaviour–a
high degree of correlation around around
the poles and low around the boundary.
By moving one of the graphs 9B one can
make a comparison. Somewhat unexpectedly, the graphs behave almost identically.
The memory case has a little higher correlation, but not much. From this analysis we get some trends on the memory
model with respect to the purely stochastic. However, only one nuclei configuration has been considered here. It is thus
too early to draw any final conclusions.
Degree of correlation
Another measure of how irregular the
boundary is given by the average x–
projection of distance travelled between
two steps. This is shown in the histogram
of Figure 8B. It also shows the trend of the
boundary being more straight with memory. In average the steps without memory
are 6% longer than with memory. However, to get a better picture of this behaviour, one needs more configurations of
points.
0.8
0.6
n=1
n=2
n=3
n=4
n=5
n=6
n=7
n=8
0.4
0.2
0
0
0.2
0.4
0.6
0.8
Position of cluster
1
1
0.8
0.6
n=1
n=2
n=3
n=4
n=5
n=6
n=7
n=8
0.4
0.2
0
0
0.2
0.4
0.6
0.8
Position of cluster
FIG. 9. (A) Correlation with memory. (B) Correlation
of state for clusters without memory as a function of
average position.
1
10
V.
CONCLUDING REMARKS AND
FUTURE PROSPECTS
During this internship, a method for
simulating gene activation with memory
has been successfully implemented. This
has been done using a simple stochastic
model of gene expression simulated by using the Gillespie algorithm. By running
simulation routines for the cases with and
without memory, statistical differences between the two models have been explored.
It was found that in the presence of
memory, the border moves towards the
posterior. The results also indicates that
memory can have a slight increasing effect on the precision of the boundary. It
should be noted that the width obtained
is still higher than experimental data [1].
In the experimental data it is about 10%
and in the present work it is around 20%.
This means that neither without nor with
memory the stochastic simulation is able
to achieve the precision of the experimental data. Interestingly, the correlation
of the clones behaves very similarly with
and without memory. To further establish these results the simulations should be
performed on more nuclei configurations.
Gene expression is a complex process.
The model used in this report has simplified the gene expression to a simple
binding and unbinding of Bicoid. A natural continuation of this work would be
to model the multiple binding sites of Bicoid and to introduce the self-activation of
Hunchback, as experiments indicate that
it might have an effect on the memory [1].
A goal for this implementation should be
to base the activation of Hunchback on the
data of how each of the binding sites behave. By doing so, one will be much better
equipped to do actual comparison with experiments and to be able to see how memory changes the evolution of the pattern
during the cycle. A simulation like this
would be able to a greater extent to say if
memory is needed.
This internship with the Curie Institute has certainly been a valuable experience. It has been truly inspiring to be
part of multiple research teams, to take
part in discussions of physical problems
with professional scientists and to become
familiar with the way researchers work in
France. My background has mainly been
oriented towards mathematical and theoretical physics. However, by working on
this project I have learned a lot about
biology–a field which was–scientifically–
previously almost unknown to me. I also
feel that I have grown as computational
physicist, especially by learning to know
the Gillespie and Monte Carlo algorithm,
which is something I know I will benefit
from in the future. Throughout the entire
internship I have felt very welcome and enjoyed participating in the group meetings.
It has been very motivating to go from not
understanding anything in lab meetings to
understanding a lot. I would like to end
this report by thanking Mathieu Coppey,
Aleksandra Walczak and especially Teresa
Ferraro for all their help during the stage.
Without their help I would have been lost.
11
[1] Porcher, A., Abu-Arish, A. Huart, S., Roelens B.,
Fradin, C. and Dostatni, N. Time to measure positional information. Development, 2009.
[2] Gillespie, D. T. Exact simulation of coupled chem-
and one can thus assume steady state solution. This solution is governed by
d2 [bcd]
− α [bcd] .
0=D
dx2
ical reactions. Naval weapons center, China Lake,
Califorina 1977.
[3] Wolpert L. Positional information and the spatial
the boundary condition is to have constant
concentration b0 at the anterior side. The
solution then is the following equation
pattern of cellular differentiation Journal of Theox
[bcd] = b0 e− λ ,
retical Biology 1969.
[4] Alon, U An introduction to systems biology. Chap-
where λ =
man & Hall, 2007.
p
(A1)
D/α.
[5] Porcher, A. and Dostatni, N. The bicoid morphogen
system. Current biology, 2010.
[6] Marzen, S., Garcia, H.G. and Philips, R. The
statistical mechanics of Monod–Wyman–Changeux
(MWC) models. Journal of Molecular Biology,
2013.
[7] Gregor, T., Tank, D.W., Wieschaus, E.F. and
Bialek, W. Probing the limits to positional information. Cell, 2007.
Appendix A: Derivation of exponential
distribution of Bicoid
A simple model for the diffusion and
degradation is to assume that the dynamics of the concentration is governed by
2
d [bcd]
d [bcd]
=D
− α [bcd] ,
dt
dx2
where D is the diffusion constant and α
is the degradation constant.The system
reaches equilibrium before transcription
Appendix B: Analytical results for the
telegraph process
We only consider two states x = 1 (on–
state) and x = 0 (off–state). The master
equations for this process is then given by
dP (1, t)
= kon P (0, t) − koff P (1, t) ,
dt
dP (0, t)
= koff P (1, t) − kon P (0, t) .
dt
At steady state we have that
dP (1, t)
= 0,
dt
dP (0, t)
= 0,
dt
Solving for the probabilities we find that
P (1, t)st =
kon
,
kon + koff
(B1a)
P (0, t)st =
koff
.
kon + koff
(B1b)
12
reaction network:
On obtains easily then
hxist = 1 × P (1, t)st + 0 × P (0, t)st
kon
.
(B2)
=
kon + koff
kon{1} (1−X1 )
−
*
∅−
)
−−
−−
−−
−−
−−
−
− X1
koff X1
..
.
kon{i} (1−Xi )
−−
*
∅)
−−
−−
−−
−−
−−
−
− Xi
and the variance
koff Xi
Var [x]st = x2 st − hxi2st
kon koff
=
(kon + koff )2
..
.
kon{n} (1−Xn )
(B3)
∅−
)−
−−
−−
−−
−−
−−
−*
− Xn .
koff Xn
Note that the on rates have indices to account variation in on rates dependent on
positioning av the cells. This is described
mathematically in matrix form by

Appendix C: The Gillespie algorithm

X1
 .. 
X= . 
Xn
(C2)
In situations with few molecules and
slow processes chemical reactions are and
poorly described by deterministic differen

tial equations. Reactions are in situations
kon{1} (1 − X1 ) − koff X1


..
like this better described by chemical masa=
 (C3)
.
ter equations. The problem is that these
kon{n} (1 − Xn ) − koff Xn
master equations are very hard to solve
analytically. To describe these chemical and gives the following equation
master equations Gillespie proposed an aldX
gorithm called Gillespie in his paper [2] as
= t aX.
(C4)
dt
way to simulate exactly coupled chemical
reactions. The telegraph process for one
The rate of any reaction happening, a0 , is
cell by the following reaction
given by sum of all reactions
kon (1−X)
−−
*
∅)
−−
−−
−−
−
− X,
(C1)
koff X
where X = 0 means off and 1 means on.
In the simulation there are multiple cells.
The state of each of these n cells is described by Xi . Which means the following
a0 =
X
ai .
(C5)
The probabilities then for any or none reaction during ∆t are then given by ∆ta0
and 1 − ∆ta0 . The probability of no reaction occurring within N time steps is given
13
cals and rates of reactions and time
t = 0.
by
p (T > N δt) = (1 − a0 ∆t)N .
(C6)
Let N → ∞ and ∆t → 0 so that N ∆t → t.
Using these limits one gets that
a0 N ∆t N
p (T > N δt) = lim 1 −
N →inf
N
a0 t N
= lim 1 −
N →inf
N
= exp −a0 t.
(C7)
∆t→0
One can then generate a time step for the
next reaction to happen with this cumulative distribution by
δt =
ln 1/r1
,
a0
(C8)
where r1 is an uniformly distributed number between 0 and 1. Which of the reaction that happens is determined by
i=1
ai < a0 r2 <
δt = (1/a0 ) ln 1/r1
4. Choose reaction j so that
j−1
X
ai < a0 r2 <
i=1
j
X
ai
i=1
5. Put t = t + δt.
6. Adjust states and rates according to
reactions.
7. Repeat steps 2–6 until the desired
time is reached.
Appendix D: Procedure for tracking the
(C9)
Which is equal to choosing a reaction by
the following criteria
j
X
3. Calculate time to next reaction by
boundary
ai
pi = .
a0
j−1
X
2. Generate two random numbers r1 and
r2 uniformly distributed
ai ,
(C10)
i=1
where r2 is a uniform random number between 0 and 1. Using this scheme for the
evolution one obtains statistically correct
paths for the time evolution of the system.
The algorithm is thus implemented as follows:
1. Initialise starting amounts of chemi-
The boundary position is found by using histograms of the cells and checking
when one bin to another passes a criteria
for boundary point. The width is found in
a similar way by having two criteria and
finding the distance between when these
criteria are broken.
A good tracking of the boundary should
reflect the transition from high degree of
expression to a low degree of expression.
To obtain this the embryo was divided into
two parts. An anterior part of high expression and a posterior part of low expression.
The procedure of dividing the embryo is as
follows
14
1. Divide
the
embryo
anterior→posterior into bins of
size average distance to nearest
neighbour
2. Decide criteria for bin to be considered anterior and posterior.
3. If the average activity of the bin is
not above criteria for being anterior
and not below criteria for being considered posterior, then nuclei inside
these types bin that are on are to be
considered anterior and posterior if
this is not the case.
Once anterior and posterior nuclei are established one can run the routine for finding the boundary. The routine for finding the boundary is based on looking after
a specified number of nearest neighbours
for every anterior nuclei and counting how
many of them are posterior nuclei. Nuclei
near the boundary will have many nearest neighbours that are posterior, thus by
enforcing minimum and maximum number of posterior neighbours one can find
the boundary points. These points are not
in a order that reflects the boundary. To
get fairly good order of points between the
lowest and the highest point one can sort
the boundary after y–position. This is often sufficient but in some cases, can give a
really erratic boundary. A simple way to
get a good sequence is to solve the problem
as travelling salesman problem. Which is
to find the order of points of which the
the total distance is the smallest. A simple way of solving the the travelling sales
is by means of an Monte Carlo algorithm.
Appendix E: Matlab script and settings used
in general
The
matlab code can be found
at
https://www.dropbox.com/sh/
jiptv7znldryf0l/2blJiSO8sY.
Note
that it’s added for completion and is only
meant for the very interested reader who
wants to see how the actual implementation is done. If not otherwise specified the
settings for the simulation are as follows:
• If not otherwise specified cycle 13 is
used for the plots.
• The constant rates are kon = 1 and
koff = 0.1
• The bicoid concentration is normalized to b0 = 1 with λ = 0.2
• Hillcoefficient h = 5
• The lengths of the cycles in minutes
are T10 = 9, T11 = 10, T12 = 12 and
T13 = 21
• Degree of memory α = 1.
Appendix F: Simulation of time averaging
The cell fate is decided by which of
the genes that are expressed. Which of
the genes are expressed is dependent on
a accurate counting of number of Bicoid
molecules. In order to achieve the observed precision at the boundary it is
necessary to count the number of Bicoid
molecules to a precision of 10%. counting is done by measuring how much of the
time the promoter is bound by transcription factor. As the concentration of Bicoid
is not uniform inside the nucleus the promoter will be subject to different concentrations at all times. Instaneus counting
will result in bad counting of number of
15
Bicoid inside the nucleus. To achieve high moved from from one cube to one of the 6
precision the counting is done by time av- nearest cubes (see figure 11). The rate at
eraging over a long period.
which one molecule goes from one chamber to its neighbour j is given by
kdiff{i→j} =
D
ni
h2
(F1)
where D is the diffusivity, h is the size
of the compartment and ni is the number
of Bicoid nuclei within the compartment.
Each of the N compartments has potentially 6 types of reactions like this. The
diffusion reaction rates are given by the
following matrix
FIG. 10. Picture of the of the DNA strand and Bicoid
molecules inside the nucleus.
In order to show how this process works
and what variables are important for this
process a stochastic simulation of the system was made. The nucleus is modelled
as a cube in which there is Nbcd molecules
of Bicoid and a single strain of DNA with
the promoter region placed in the middle
of the cell. The Bicoid molecules diffuse
throughout the cube and when they are
close enough they bind and unbind to the
promoter. When the promoter is bound
the the nucleon is on, when it’s not bound
the nucleon is off.
As there are few molecules inside the
nucleus the diffusion in 3–D needs to be
simulated stochastically. The way this is
done is by dividing up the cube into a
number of smaller cubes, say N = n ×
n × n. Each of these cubes can hold Bicoid inside. Diffusion can be looked upon
as reactions in these cubes where Bicoid is


aix+
a 
 ix− 


aiy+ 
A = (a~1 . . . a~i . . . a~N ) , where ai = 
.
aiy− 


aiz+ 
aiz−
(F2)
i is the chamber index, x,y and z are
the orientation of neighbour and + and
- stands for which of the neighbours. One
example can be reaction ix+ means chamber #1 loses one Bicoid and the neighbour which is positive in x direction gains
one molecule of Bicoid. Using the scheme
presented in appendix C one can now
smulated the diffusion movement of the
molecules inside cell. To implement the
binding and unbinding of transcription
facto further reactions needs to be added.
It is assumed that the rate of binding to
the promoter is dependent only on the
amount of Bicoid in chamber #u by
aon = nu kon .
(F3)
16
the off rate is assumed to be a constant koff .
When a molecule binds to the promoter
nu → nu − 1 and when it unbinds nu →
nu + 1. This reaction can be implemented
to model by extending A by
aN +1


N × nu kon


koff




0


=
.

0





0
0
1
−6
x 10
0.9
5
0.8
4
0.7
3
0.6
2
0.5
1
0.4
0
5
0.3
0.2
4
(F4)
3
−6
x 10
2
3
1
0
1
0
5
4
2
0.1
−6
x 10
0
FIG. 12. Molecules inside a box diffusing.
Using the scheme presented in Appendix
C one can simulate this system.
0.8
0.7
i+1
0.6
Activity
i-1
i
0.5
0.4
0.3
0.2
0.1
0
FIG. 11. Bicoid diffusing possibility in x–direction.
Firstly it should be established that the
diffusion is implemented correctly. One
check is that if all the molecules are centred in the middle of the cell, the molecules
will diffuse spherically. A correctly implemented simulation should converge to
a homeogenous solution (if there is sufficient number of Bicoid molecules). Figure
12 shows just this.
The time averaging is mechanism is visualised in figure 13
Appendix G: Correlation of clusters
In [1] it was shown that clusters of
clones are located close to each other. If
0
100
200
300
400
500
Time in minutes
FIG. 13. Time averaging of promoter state with 700
molecules.
there is memory one might expect that
these clusters have coherent states, meaning all on or all off. Let σi be the state of
clone i and the possible values be
σi = 1 (On–State)
σi = −1 (Off–State).
A measure of how correlated the cluster is
then given by
C = |hσi i| .
(G1)
17
N 1 X N i
p (1 − p)N −i |N − 2i|
hCi =
N i=0 i
(G2)
where i is the number of nuclei that are
on in the cluster, p is the probability of
being on and N is the total number of nuclei within the cluster. Theaverage correlation and the standard deviation is shown
in Figure 14 and 15.
1
Degree of correlation
Correlation of 1 would mean that all nuclei
are either on or off, while a correlation of
zero means that half are on and half are
off. Assume the cluster to be situated at
point < x >. The probability of being
active for each of the nuclei is then p (x).
The expected value of C is
0.8
0.6
n=1
n=2
n=3
n=4
n=5
n=6
n=7
n=8
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
Position of cluster
FIG. 14. Theoretical correlation as a function of position.
18
0.5
n=1
n=2
n=3
n=4
n=5
n=6
n=7
n=8
Std of correlation
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
0.2
0.4
0.6
0.8
1
Position of cluster
FIG. 15. Theoretical standard deviation of the correlation as a function of position.
Download