Stochastic modelling of memory effects on the Hunchback gene activation in the fruit fly embryo Sigbjørn Bore∗ UPMC (Dated: December 5, 2013) In this report a possible memory mechanism in gene activation during the early development of the fruit fly embryo is analysed. This is done by proposing a simple stochastic model and simulating using the Gillespie algorithm. The results indicate that there are statistical differences between models which do and do not incorporate memory in the embryonic development. I. INFORMATION ABOUT THE Teresa Ferraro. INSTITUTION AND THE GROUP The work has been carried out at the Institute Curie in collaboration between the group of Maxime Dahan and the group of Natalie Dostatni. The Curie Institute is located in Rue de Pierre et Marie Curie and at Orsay. The institute is principally devoted to research on cancer through medicine, biology and biophysics. In addition, many groups at the Curie Institute are working on more fundamental research. The AXOMORPH work group (funded by the ANR) is a research collaboration which focuses on the dynamic and quantitative understanding of axis formation in Drosophila. The group includes biologists (Nathalie Dostatni, UMR 218) who gather data and physicists (Mathieu Coppey, UMR 168, Teresa Ferraro, UMR 168 and ENS Paris, and Aleksandra Walczak, ENS Paris) who analyse the data. The supervisor for this internship has been ∗ Also at NTNU; sigbjorn.bore@curie.fr II. INTRODUCTION The embryogenesis of drosophila melanogaster starts with the entry of a sperm cell into an egg cell. The egg nucleus and the sperm cell then fuse to form a new cell called the zygote, which shares half of the DNA of the father and the mother. This cell is pictured in Figure 1A. In the zygote, the initial fused nucleus undergoes rapid mitosis (cell division) forming sequentially 2, 4, 8 nuclei up to 8000 nuclei after the 13th nuclear division. All these nuclei share the same cytoplasm–such an embryo is called a syncytium. The timing during the first 2 hours of the early development is referenced by the 14 nuclear cycles. At cycle 7, the nuclei start migrating towards the plasma membrane at the cortex of the embryo (the shell). Before this, the nuclei have not decided what to cell type to become (no differentiation). This process 2 only starts when the nuclei have reached the cortex. In the classical picture of the ”French flag” proposed by Wolpert [3], nuclei decide their fate by measuring local concentrations of morphogens. Such morphogens are proteins that are distributed as gradients throughout the embryo. Given these gradients, the cells can get a positional information regarding the axis of the embryo (dorso-ventral and anterior-posterior). Cells can tell where they are and what to become (this process is called patterning). One of the most well characterised morphogens is Bicoid. Bicoid mRNA is anchored in the anterior pole of the egg by the mother during oogenesis. After fertilisation, the bicoid mRNA start being translated into proteins. The Bicoid proteins are free to diffuse and forms the pattern shown in Figure 1B. This pattern is very well approximated by an exponential derived in Appendix A. At cycle 8 the nuclei start zygotic transcriptions (production of non-maternal proteins). In the case of Bicoid, cells respond in a threshold manner. The most exemplar target of Bicoid is the zygotic gene Hunchback. Over a certain concentration, Bicoid activates the production of Hunchback, and under, it does not. As a result, the distribution of Hunchback is a steep gradient which divide the anterior and posterior side of the embryo as depicted in Figure 1C. At the anterior side (the side of the embryo that will eventually will become the head) Hunchback is expressed (Hunchback is present) and at the posterior side (the opposite side) it is expressed very little. This sharp divide in spatial expression is crucial for the future formation of the head structures of the fruit fly. A C B Anterior Posterior FIG. 1. (A) The stages of embryonic development of Drosophila. (B) Pictorial representation of the Bicoid gradient within the embryo. (C) Distribution of Hunchback protein at cycle 12. Black means high concentration of Hunchback. There is Hunchback at the posterior side, but this caused by other morphogens than Bicoid. To understand some of this behaviour we need to go into how gene regulation works. What follows is by no means the whole story, but merely what is necessary to do a simple physical model. Each nucleus of the embryo can be thought as a chemical compartment which has its own DNA and amount of morphogen. The DNA encodes for the information needed to produce proteins. The process by which this happens is called gene expression. The important elements in gene expression are depicted in Figure 2 A, B and C. The genes on the DNA are normally preceded by a regulatory region called the promoter region. When another compound RNA polymerase binds to this region it causes production of mRNA that corresponds to this coding region. This in turn is translated into a protein (gene product). The probability of RNA polymerase binding is dependent on transcription factors (such 3 as Bicoid). The presence of transcription factors changes the binding probability of RNA polymerase. If the transcription factor is an activator, it increases the probability of the binding of RNA polymerase. If it is a repressor, it decreases the probability of the RNA polymerase binding. An example of this is the Bicoid–Hunchback system, where Bicoid acts as an activator and controls the production of Hunchback mRNA. This report will mostly be concerned with the first step of gene activation–binding and unbinding of the transcription factor Bicoid. A Gene X Promoter B Protein X Translation mRNA RNA polymerase Transcription Gene X C X X X X X X Activators Y Y Y Y Binding Site Increased transcription Gene X FIG. 2. (A) Depiction of important parts of the DNA. (B) The steps of gene expression. RNA polymerase binds to the promoter and transcribes coding region into mRNA. The mRNA is then translated into protein X. (C) Transcription factor Y (activator) may bind to the binding site. The binding results in a higher probability of the binding of mRNA polymerase, and thereby increasing the production of X. Gene expression is dependent on binding, unbinding and multiple chemical reactions. Processes involving many molecules and fast reactions are adequately de- scribed by deterministic differential equations. However in cells this is often not the case. In the case of Bicoid, recent measurements [7] suggest that they are of the order of 700 molecules in each nucleus at the ”on–off” border of Hunchback in the middle of the embryo. Expression of Hunchback is suspected to be distributed in long bursts (it is not yet known, as until now one has only had access to still images and not movies of gene expression). Situations like this calls for a stochastic model. In this report we limit ourselves to only looking at the binding and unbinding of the morphogen (step 1). Thereby only checking if production of Hunchback mRNA is active or not. The activation of Hunchback can be modelled as a simple telegraph process. In a telegraph process the state of the system is described by the two states on and off. The rates at which the system goes from the off to the on state is given by kon and the opposite by koff . When Bicoid is bound, Hunchback is produced and we say that the gene is on. When Bicoid is not bound, Hunchback is not produced and we say the gene is off (for analytical results read appendix B). In most cases there are not only one binding site at the DNA, but several. This is the case with Bicoid where experiments seems to indicate about six binding sites. The way these binding sites work together is called the cooperativity and is essential for the precise patterns of gene expression. One says that if the binding of one morphogen protein increases the probability of the binding of the next morphogen proteins, the morphogens acts with positive cooperativity and for the op- 4 posite, negative cooperativity. From this kind of behaviour one ends up with step like responses as function of concentration called Hill functions. The more step-like they are, the higher Hill coefficient and the higher degree of cooperativity. This gives a threshold behaviour typical for biological systems. The simplest model for the activation of Hunchback assumes that the rate at which Hunchback is activated is proportional to the concentration of Bicoid. This corresponds to kon → kon · [bcd] . (1) On the contrary, koff is related to particles knocking off Bicoid from the binding site (thermal fluctuations) and is assumed to be independent of Bicoid. Cells are thought to average the concentrations in time in order to factor out the noise to give out precise expression. This noise is coming from the low particle number and the inherent stochasticity in elementary chemical reactions. We know from experiments looking at the boundary that in order for the cells have this precise boundary they need to know how many Bicoid there are within the nuclei to a precision of 10%. This would mean that a promoter in a nucleus (a single molecule!) is able to distinguish 700 molecules from 770 molecules in few minutes (a nuclear cycle takes about 10 minutes). It is believed that the cells use a time averaging mechanism to achieve this precision. The physical limit of the time needed was cal- culated in [1] using 1 δc ∼√ c DacT (2) where D is the diffusion constant, a is the size of the promoter region, c is the concentration of Bicoid and T is the time. The time required was calculated to to be around 25 minutes. This time is too long for the boundary to be established before the end of the earlier nuclear cycles, especially if one considers the new data using a MS2-system which is able to show movies of the production Hunchback-mRNA. These movies indicate a synchronous and precise expression few minutes after mitosis. This problem has been one of the main focuses in the AXOMORPH group. It has been speculated by this group and other researchers that nuclei may memorise their ancestor states (if the mother nucleus was on or off before mitosis) and change the probability of being active for the next generation to be on if the mother was on. This would mean that at each cycle, nuclei don’t necessarily have to do a new average of morphogen concentration in order to yield a precise expression. There are many possible ways the cells can achieve this. One possibility is that the two daughter nuclei has the same status as the mother nucleus. Another possibility is that the rate of being on gets higher for the next generation if the mother nucleus has been on. As the daughter nuclei that originate from the same mother form clusters [1], it is expected for these clusters to behave similarly. One might expect that this has ef- 5 fects on the the shape and positioning of the boundary. After getting familiar with the field, it was thought to be most fruitful to focus on the establishment of the Hunchback boundary. What we would like to study was how incorporation of memory affects the boundary, considering shape and positioning. Would the boundary move differently with memory from cycle to cycle than without memory? Is the boundary longer and more convoluted with or without memory? To be able to answer some of these questions, routines for stochastic simulation were developed. The aim of the simulations is to simulate the activation and deactivation of the nuclei in the embryo in 2-D and study the pattern of active nuclei given a distribution of morphogen, with the aim of checking whether or not there are statistical differences between memory and non-memory models. III. MODEL AND NUMERICAL PROCEDURE To study the effect of memory a MATLAB program for stochastic simulation using the Gillespie algorithm was written (see appendix C and E). The idea of the simulation is to mimic the activation of Hunchback on a grid of nuclei across cycles where the position and ancestry is determined by experimental data (see Figure 3). The algorithm consists of: 1) Run a simulation of the telegraph process on the ensemble of cells. 2) Check at the end of the cycle the status of each cell and then analyse the pattern with and with- out memory. 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 Position anterior posterior FIG. 3. Experimental data of the positioning of the cells. The cells of same color corresponds to cells which originate from the same cell. To carry out a simulation as described above, the Gillespie algorithm was used. The Gillespie algorithm was proposed by Daniel Gillespie as a means of simulating chemical reaction networks described by master equations. A key feature of the Gillespie algorithm is that it is exact. Given a master equation the algorithm produces statistically the correct time evolution. This is a consequence of its derivation from the master equation which involves no approximations. The telegraph process for a system of cells can be described by network of reactions (see appendix C). From experiments we know that the activation of Hunchback follows a sigmoidal Hill function with a Hill coefficient of 5. To get this kind of response it is not adequate to assume that the on rate is linearly dependent on Bicoid. What one ought to do is to develop a model with multiple binding sites and cooperativity. Nevertheless, how cooperativity in gene expression works is poorly known. There are sophisticated models like the MWC models [6] that can achieve cooperativity. However at the cost of introducing at least six parameters which are not experimentally 6 known instead of two parameters (kon and koff ), which is not desirable. However there is a simple way of generating a pattern of gene expression that is similar to what we observe in experiments without modelling the cooperativity. This is done by assuming a Hill function for the Hunchback response to Bicoid: kon → kon [bcd]h h h [bcd] + (K) , (3) where h is the Hill coefficient and K is constant of related to position of the boundary. This relation gives a sigmoidal response to Bicoid. The way of modelling the memory mechanism is somewhat arbitrary, as there are many different ways of doing it. In this report we consider only the state of the mother cell at the end of the cell cycle. If the mother was on the rate of being on, it is changed as follows: kon{i} → kon α + kon{i} (4) where α is positive and corresponds to the degree of memory. The rate is thus the sum of the rate of being active caused by Bicoid–activation plus a memory constant from the mother being active. Note that in the simulation nuclei can only have one alpha constant added. An important part of the numerical procedure is how the boundary is characterised and what types of algorithms are used. A description is found in Appendix D. IV. RESULTS AND ANALYSIS Before doing any analysis of the border it needs to be established that the numerical routines works. A first check that the Gillespie algorithm works is to compare the ensemble average using the Gillespie algorithm to the theoretical steady state solution in Appendix A. As is seen in Figure 4, these curves overlap, indicating that the algorithm has been implemented correctly. Not only the Gillespie algorithm needs to work correctly–it is essential for the analysis of the boundary that the algorithms used for this yields sensible results. The tracking of the boundary must to reflect how the boundary is shaped. Figure 4 B and C shows an example of how the tracking works in cycle 13. The boundary marked in green is placed in a position that corresponds to the boundary. The boundary tracked in blue reflects transition from going to many active nuclei to a few (see Appendix D). It should be be kept in mind that the algorithm for tracking the boundary does not work optimally. Often the boundary is highly distorted and it is very hard to define a clear transition. Having established that the algorithms work to some degree, we are now ready to statistically analyse the difference between the cases with and without memory, starting off by looking at the ensemble average of activity at the end of the cycles in each case. As seen in Figure 5, at cycle 10 the curves overlap since no memory has been introduced yet. At cycle 11 the curves start deviating. The curve with memory start to move towards the posterior side. This is likely to be caused by two factors. 7 A B 1 SimulatedSpath SteadySstateSsolution Average activity of bin 0.8 0.7 0.6 0.5 0.4 0.3 0.2 C 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.1 0 1 0.9 Probability of being active 0.9 0 0.2 0.4 0.6 0.8 Position anterior posterior 1 0 0 0.2 0.4 0.6 0.8 Anterior Posterior 1 0.3 0.2 0.1 0 0 0.2 0.4 0.6 Anterior Posterior 0.8 1 FIG. 4. (A) Graph over simulated ensemble average compared to the theoretical steady state solution. (B) The blue line shows a smoothed line of average activity of histogram and the green line show the computed middle position. (C) The picture shows the activity at the end of cycle 13 (red are active and black are inactive nuclei), the blue line is the tracked boundary between the high rate of expression and low rate of expression. changes the Hill coefficient. The new Hill coefficient was computed using a fitting function. As seen in Figure 6 the fitted Hill function overlaps perfectly. However, the Hill coefficient produced by memory is only 10% higher than the Hill coefficient used. This indicates that memory can increase the precision, but the effect is not too strong. 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0.5 1 0 0 0.5 Cycle 10 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0.5 1 Cycle 11 1 0 0 0.5 1 At the border, nuclei still have a probaCycle 12 Cycle 13 bility of being active. Thus nuclei that were active by chance at the end of the cy- FIG. 5. Average activity at cycle 10–13. Blue line is without memory and green line is with memory. cle will get daughters with a high on rate pushing the border towards the posterior side. Additionally, the nuclei move during mitosis and can cause movement of the border. Another interesting feature is that the green curve seems a bit steeper than the blue one. One may thus expect that memory can contribute to precision of the boundary. This would then be reflected in a higher Hill coefficient. The previous Position Anterior-Posterior simulation was done with a Hill coefficient of 5, which is high (so high that the effect FIG. 6. Average activity at cycle 13 fitted with Hill of memory may be hindered). The ques- function. tion then is whether memory could have The previous results have indicated that a significant effect on the precision. To check this, a simulation with a Hill coeffi- the border moves and that precision incient of 3 was done to see if the memory creases. It is also interesting to look at 1 Ens. avg. mem, h=3 Fitted, h=3.2835 0.9 0.8 Activity 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 8 how the width and position of the boundary is distributed. This can be done with histograms. We looked at the boundary position by measuring how much the midpoint varies from simulation to simulation. A boundary that varies much in position is not good for the embryo. If memory would increase the variability of the boundary then it might not be a good hypothesis. The width is also an indication of how precise the boundary is. In Figure 7A a histogram of the boundary position is graphed. The histograms show that the average boundary position moves towards the posterior by 5%. The variance is 10% for the non memory case. Figure 7B shows the width of the boundary. The average width of the boundary is 6% (in embryo length) longer for non memory than memory. Notice there is a secondary smaller peak in the distribution of Figure 7B distribution. This is caused by the failure of the criteria for boundary determination in the case of high noise. The algorithm checks when the activity goes under a specific value. In the anterior part there are few nuclei and thereby few nuclei in the bin. These bins are very susceptible to noise and by chance then the width gets overestimated. It has been speculated that the form of the boundary might change with memory. In [1] it was observed that the pattern got more and more convoluted across cell cycles and that this could be caused by memory. To check if there are differences in the length of the boundary with and without memory, thousand simulations of the boundary was done with and without memory. A measure of how straight the A B FIG. 7. (A) Histogram over the boundary position with and without memory.(B) Histogram over the boundary width with and without memory. boundary is, is the total distance of the boundary divided by the distance between the top and bottom points. A boundary of length 1 indicates a very straight boundary and a higher number indicates a boundary which is not straight. Figure 8A shows a histogram of the total distance divided by the distance between top and bottom point. Unlike some of the predictions in [1] the boundary with memory is in average shorter than the boundary without memory. the boundary without memory is 4% longer than the boundary with memory. An interesting artifact is the high peak that appears in the memory case. This peak probably appears because a particular boundary has the tendency of being repeated perfectly, and memory helps to facilitate this. It is also possible that this is caused by a bug. 9 B FIG. 8. (A) Histogram of boundary length with and without memory.(B) Histogram of the average step length in x direction. We expect the clusters of clones to behave similarly, meaning that their states should be much the same. This is expected since they are placed close to each other and are thus spatially correlated. However, near the border the small differences in position coupled with an even probability of being on and off should lead to a lower degree of correlation. In the case of A B 1 Degree of correlation A memory one might expect the nuclei to be correlated even at the boundary. If nuclei being on have status 1 and nuclei being off have status -1 then a good measure of how well correlated the clusters are can be given by absolute value of average state of the cluster. Meaning if a cluster has correlation 1, all nuclei have the same state, and correlation 0 means that they are poorly correlated. Figure 9A shows a graph like this of 10000 simulations on the experimental data on positioning. The two cases shows qualitatively the same behaviour–a high degree of correlation around around the poles and low around the boundary. By moving one of the graphs 9B one can make a comparison. Somewhat unexpectedly, the graphs behave almost identically. The memory case has a little higher correlation, but not much. From this analysis we get some trends on the memory model with respect to the purely stochastic. However, only one nuclei configuration has been considered here. It is thus too early to draw any final conclusions. Degree of correlation Another measure of how irregular the boundary is given by the average x– projection of distance travelled between two steps. This is shown in the histogram of Figure 8B. It also shows the trend of the boundary being more straight with memory. In average the steps without memory are 6% longer than with memory. However, to get a better picture of this behaviour, one needs more configurations of points. 0.8 0.6 n=1 n=2 n=3 n=4 n=5 n=6 n=7 n=8 0.4 0.2 0 0 0.2 0.4 0.6 0.8 Position of cluster 1 1 0.8 0.6 n=1 n=2 n=3 n=4 n=5 n=6 n=7 n=8 0.4 0.2 0 0 0.2 0.4 0.6 0.8 Position of cluster FIG. 9. (A) Correlation with memory. (B) Correlation of state for clusters without memory as a function of average position. 1 10 V. CONCLUDING REMARKS AND FUTURE PROSPECTS During this internship, a method for simulating gene activation with memory has been successfully implemented. This has been done using a simple stochastic model of gene expression simulated by using the Gillespie algorithm. By running simulation routines for the cases with and without memory, statistical differences between the two models have been explored. It was found that in the presence of memory, the border moves towards the posterior. The results also indicates that memory can have a slight increasing effect on the precision of the boundary. It should be noted that the width obtained is still higher than experimental data [1]. In the experimental data it is about 10% and in the present work it is around 20%. This means that neither without nor with memory the stochastic simulation is able to achieve the precision of the experimental data. Interestingly, the correlation of the clones behaves very similarly with and without memory. To further establish these results the simulations should be performed on more nuclei configurations. Gene expression is a complex process. The model used in this report has simplified the gene expression to a simple binding and unbinding of Bicoid. A natural continuation of this work would be to model the multiple binding sites of Bicoid and to introduce the self-activation of Hunchback, as experiments indicate that it might have an effect on the memory [1]. A goal for this implementation should be to base the activation of Hunchback on the data of how each of the binding sites behave. By doing so, one will be much better equipped to do actual comparison with experiments and to be able to see how memory changes the evolution of the pattern during the cycle. A simulation like this would be able to a greater extent to say if memory is needed. This internship with the Curie Institute has certainly been a valuable experience. It has been truly inspiring to be part of multiple research teams, to take part in discussions of physical problems with professional scientists and to become familiar with the way researchers work in France. My background has mainly been oriented towards mathematical and theoretical physics. However, by working on this project I have learned a lot about biology–a field which was–scientifically– previously almost unknown to me. I also feel that I have grown as computational physicist, especially by learning to know the Gillespie and Monte Carlo algorithm, which is something I know I will benefit from in the future. Throughout the entire internship I have felt very welcome and enjoyed participating in the group meetings. It has been very motivating to go from not understanding anything in lab meetings to understanding a lot. I would like to end this report by thanking Mathieu Coppey, Aleksandra Walczak and especially Teresa Ferraro for all their help during the stage. Without their help I would have been lost. 11 [1] Porcher, A., Abu-Arish, A. Huart, S., Roelens B., Fradin, C. and Dostatni, N. Time to measure positional information. Development, 2009. [2] Gillespie, D. T. Exact simulation of coupled chem- and one can thus assume steady state solution. This solution is governed by d2 [bcd] − α [bcd] . 0=D dx2 ical reactions. Naval weapons center, China Lake, Califorina 1977. [3] Wolpert L. Positional information and the spatial the boundary condition is to have constant concentration b0 at the anterior side. The solution then is the following equation pattern of cellular differentiation Journal of Theox [bcd] = b0 e− λ , retical Biology 1969. [4] Alon, U An introduction to systems biology. Chap- where λ = man & Hall, 2007. p (A1) D/α. [5] Porcher, A. and Dostatni, N. The bicoid morphogen system. Current biology, 2010. [6] Marzen, S., Garcia, H.G. and Philips, R. The statistical mechanics of Monod–Wyman–Changeux (MWC) models. Journal of Molecular Biology, 2013. [7] Gregor, T., Tank, D.W., Wieschaus, E.F. and Bialek, W. Probing the limits to positional information. Cell, 2007. Appendix A: Derivation of exponential distribution of Bicoid A simple model for the diffusion and degradation is to assume that the dynamics of the concentration is governed by 2 d [bcd] d [bcd] =D − α [bcd] , dt dx2 where D is the diffusion constant and α is the degradation constant.The system reaches equilibrium before transcription Appendix B: Analytical results for the telegraph process We only consider two states x = 1 (on– state) and x = 0 (off–state). The master equations for this process is then given by dP (1, t) = kon P (0, t) − koff P (1, t) , dt dP (0, t) = koff P (1, t) − kon P (0, t) . dt At steady state we have that dP (1, t) = 0, dt dP (0, t) = 0, dt Solving for the probabilities we find that P (1, t)st = kon , kon + koff (B1a) P (0, t)st = koff . kon + koff (B1b) 12 reaction network: On obtains easily then hxist = 1 × P (1, t)st + 0 × P (0, t)st kon . (B2) = kon + koff kon{1} (1−X1 ) − * ∅− ) −− −− −− −− −− − − X1 koff X1 .. . kon{i} (1−Xi ) −− * ∅) −− −− −− −− −− − − Xi and the variance koff Xi Var [x]st = x2 st − hxi2st kon koff = (kon + koff )2 .. . kon{n} (1−Xn ) (B3) ∅− )− −− −− −− −− −− −* − Xn . koff Xn Note that the on rates have indices to account variation in on rates dependent on positioning av the cells. This is described mathematically in matrix form by Appendix C: The Gillespie algorithm X1 .. X= . Xn (C2) In situations with few molecules and slow processes chemical reactions are and poorly described by deterministic differen tial equations. Reactions are in situations kon{1} (1 − X1 ) − koff X1 .. like this better described by chemical masa= (C3) . ter equations. The problem is that these kon{n} (1 − Xn ) − koff Xn master equations are very hard to solve analytically. To describe these chemical and gives the following equation master equations Gillespie proposed an aldX gorithm called Gillespie in his paper [2] as = t aX. (C4) dt way to simulate exactly coupled chemical reactions. The telegraph process for one The rate of any reaction happening, a0 , is cell by the following reaction given by sum of all reactions kon (1−X) −− * ∅) −− −− −− − − X, (C1) koff X where X = 0 means off and 1 means on. In the simulation there are multiple cells. The state of each of these n cells is described by Xi . Which means the following a0 = X ai . (C5) The probabilities then for any or none reaction during ∆t are then given by ∆ta0 and 1 − ∆ta0 . The probability of no reaction occurring within N time steps is given 13 cals and rates of reactions and time t = 0. by p (T > N δt) = (1 − a0 ∆t)N . (C6) Let N → ∞ and ∆t → 0 so that N ∆t → t. Using these limits one gets that a0 N ∆t N p (T > N δt) = lim 1 − N →inf N a0 t N = lim 1 − N →inf N = exp −a0 t. (C7) ∆t→0 One can then generate a time step for the next reaction to happen with this cumulative distribution by δt = ln 1/r1 , a0 (C8) where r1 is an uniformly distributed number between 0 and 1. Which of the reaction that happens is determined by i=1 ai < a0 r2 < δt = (1/a0 ) ln 1/r1 4. Choose reaction j so that j−1 X ai < a0 r2 < i=1 j X ai i=1 5. Put t = t + δt. 6. Adjust states and rates according to reactions. 7. Repeat steps 2–6 until the desired time is reached. Appendix D: Procedure for tracking the (C9) Which is equal to choosing a reaction by the following criteria j X 3. Calculate time to next reaction by boundary ai pi = . a0 j−1 X 2. Generate two random numbers r1 and r2 uniformly distributed ai , (C10) i=1 where r2 is a uniform random number between 0 and 1. Using this scheme for the evolution one obtains statistically correct paths for the time evolution of the system. The algorithm is thus implemented as follows: 1. Initialise starting amounts of chemi- The boundary position is found by using histograms of the cells and checking when one bin to another passes a criteria for boundary point. The width is found in a similar way by having two criteria and finding the distance between when these criteria are broken. A good tracking of the boundary should reflect the transition from high degree of expression to a low degree of expression. To obtain this the embryo was divided into two parts. An anterior part of high expression and a posterior part of low expression. The procedure of dividing the embryo is as follows 14 1. Divide the embryo anterior→posterior into bins of size average distance to nearest neighbour 2. Decide criteria for bin to be considered anterior and posterior. 3. If the average activity of the bin is not above criteria for being anterior and not below criteria for being considered posterior, then nuclei inside these types bin that are on are to be considered anterior and posterior if this is not the case. Once anterior and posterior nuclei are established one can run the routine for finding the boundary. The routine for finding the boundary is based on looking after a specified number of nearest neighbours for every anterior nuclei and counting how many of them are posterior nuclei. Nuclei near the boundary will have many nearest neighbours that are posterior, thus by enforcing minimum and maximum number of posterior neighbours one can find the boundary points. These points are not in a order that reflects the boundary. To get fairly good order of points between the lowest and the highest point one can sort the boundary after y–position. This is often sufficient but in some cases, can give a really erratic boundary. A simple way to get a good sequence is to solve the problem as travelling salesman problem. Which is to find the order of points of which the the total distance is the smallest. A simple way of solving the the travelling sales is by means of an Monte Carlo algorithm. Appendix E: Matlab script and settings used in general The matlab code can be found at https://www.dropbox.com/sh/ jiptv7znldryf0l/2blJiSO8sY. Note that it’s added for completion and is only meant for the very interested reader who wants to see how the actual implementation is done. If not otherwise specified the settings for the simulation are as follows: • If not otherwise specified cycle 13 is used for the plots. • The constant rates are kon = 1 and koff = 0.1 • The bicoid concentration is normalized to b0 = 1 with λ = 0.2 • Hillcoefficient h = 5 • The lengths of the cycles in minutes are T10 = 9, T11 = 10, T12 = 12 and T13 = 21 • Degree of memory α = 1. Appendix F: Simulation of time averaging The cell fate is decided by which of the genes that are expressed. Which of the genes are expressed is dependent on a accurate counting of number of Bicoid molecules. In order to achieve the observed precision at the boundary it is necessary to count the number of Bicoid molecules to a precision of 10%. counting is done by measuring how much of the time the promoter is bound by transcription factor. As the concentration of Bicoid is not uniform inside the nucleus the promoter will be subject to different concentrations at all times. Instaneus counting will result in bad counting of number of 15 Bicoid inside the nucleus. To achieve high moved from from one cube to one of the 6 precision the counting is done by time av- nearest cubes (see figure 11). The rate at eraging over a long period. which one molecule goes from one chamber to its neighbour j is given by kdiff{i→j} = D ni h2 (F1) where D is the diffusivity, h is the size of the compartment and ni is the number of Bicoid nuclei within the compartment. Each of the N compartments has potentially 6 types of reactions like this. The diffusion reaction rates are given by the following matrix FIG. 10. Picture of the of the DNA strand and Bicoid molecules inside the nucleus. In order to show how this process works and what variables are important for this process a stochastic simulation of the system was made. The nucleus is modelled as a cube in which there is Nbcd molecules of Bicoid and a single strain of DNA with the promoter region placed in the middle of the cell. The Bicoid molecules diffuse throughout the cube and when they are close enough they bind and unbind to the promoter. When the promoter is bound the the nucleon is on, when it’s not bound the nucleon is off. As there are few molecules inside the nucleus the diffusion in 3–D needs to be simulated stochastically. The way this is done is by dividing up the cube into a number of smaller cubes, say N = n × n × n. Each of these cubes can hold Bicoid inside. Diffusion can be looked upon as reactions in these cubes where Bicoid is aix+ a ix− aiy+ A = (a~1 . . . a~i . . . a~N ) , where ai = . aiy− aiz+ aiz− (F2) i is the chamber index, x,y and z are the orientation of neighbour and + and - stands for which of the neighbours. One example can be reaction ix+ means chamber #1 loses one Bicoid and the neighbour which is positive in x direction gains one molecule of Bicoid. Using the scheme presented in appendix C one can now smulated the diffusion movement of the molecules inside cell. To implement the binding and unbinding of transcription facto further reactions needs to be added. It is assumed that the rate of binding to the promoter is dependent only on the amount of Bicoid in chamber #u by aon = nu kon . (F3) 16 the off rate is assumed to be a constant koff . When a molecule binds to the promoter nu → nu − 1 and when it unbinds nu → nu + 1. This reaction can be implemented to model by extending A by aN +1 N × nu kon koff 0 = . 0 0 0 1 −6 x 10 0.9 5 0.8 4 0.7 3 0.6 2 0.5 1 0.4 0 5 0.3 0.2 4 (F4) 3 −6 x 10 2 3 1 0 1 0 5 4 2 0.1 −6 x 10 0 FIG. 12. Molecules inside a box diffusing. Using the scheme presented in Appendix C one can simulate this system. 0.8 0.7 i+1 0.6 Activity i-1 i 0.5 0.4 0.3 0.2 0.1 0 FIG. 11. Bicoid diffusing possibility in x–direction. Firstly it should be established that the diffusion is implemented correctly. One check is that if all the molecules are centred in the middle of the cell, the molecules will diffuse spherically. A correctly implemented simulation should converge to a homeogenous solution (if there is sufficient number of Bicoid molecules). Figure 12 shows just this. The time averaging is mechanism is visualised in figure 13 Appendix G: Correlation of clusters In [1] it was shown that clusters of clones are located close to each other. If 0 100 200 300 400 500 Time in minutes FIG. 13. Time averaging of promoter state with 700 molecules. there is memory one might expect that these clusters have coherent states, meaning all on or all off. Let σi be the state of clone i and the possible values be σi = 1 (On–State) σi = −1 (Off–State). A measure of how correlated the cluster is then given by C = |hσi i| . (G1) 17 N 1 X N i p (1 − p)N −i |N − 2i| hCi = N i=0 i (G2) where i is the number of nuclei that are on in the cluster, p is the probability of being on and N is the total number of nuclei within the cluster. Theaverage correlation and the standard deviation is shown in Figure 14 and 15. 1 Degree of correlation Correlation of 1 would mean that all nuclei are either on or off, while a correlation of zero means that half are on and half are off. Assume the cluster to be situated at point < x >. The probability of being active for each of the nuclei is then p (x). The expected value of C is 0.8 0.6 n=1 n=2 n=3 n=4 n=5 n=6 n=7 n=8 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Position of cluster FIG. 14. Theoretical correlation as a function of position. 18 0.5 n=1 n=2 n=3 n=4 n=5 n=6 n=7 n=8 Std of correlation 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 0.2 0.4 0.6 0.8 1 Position of cluster FIG. 15. Theoretical standard deviation of the correlation as a function of position.