1 Cell simulation tools for systems biology Rajveer Seyan Abstract— With the immense amount of data that is becoming available for cellular biological systems, cell simulation software is becoming increasingly popular as a tool to aid in the construction of cell biological models. In particular, two projects known as E-Cell and Virtual Cell have been developed. These software simulations have been successfully used to construct various models including the virtual self surviving cell, human erythrocyte, nucleocytoplasmic transport and calcium dynamics in neuronal cells. E-Cell and Virtual Cell are becoming increasingly important for in silico modeling of cellular phenomena alongside laboratory experiments. They are also being continuously updated and improved upon to overcome current limitations, and are showing evermore their necessity and indispensability in a cell biology laboratory. Index Terms—Cell Simulation, E-Cell, Systems Biology, Virtual Cell I. INTRODUCTION The main focus of molecular biology and genetic analysis has been the identification of the essential elements and fundamental mechanisms that enable cell function. Major efforts have been made on identifying the cellular mechanisms (genes and proteins) that are responsible for specific phenomena, thus creating an exhaustive knowledge of these genes and proteins and how they are involved. Although interactions have been investigated to understand the function of different cellular phenomena, studies have only been done on a small scale basis and in an ad hoc manner. This situation is being forced to change due to the emergence of new experimental methods enabling us to measure large numbers of components simultaneously. This is opening up the possibility for system-level studies [1]. This information explosion in biology has provided and will continue to provide unprecedented opportunities for the biomedical research community. However, to make full use of these opportunities, it is necessary to have tools that facilitate the analysis of such immense amounts of data. With respect to cell biology, such tools should allow the construction of quantitative models of cellular processes, thus enabling researchers to test (by simulation) whether a set of interacting molecules or structures in the cell can produce a particular behaviour [2]. The complex behaviour of the cell cannot be determined or predicted without the aid of a computer model and simulations [3]. Many attempts have been made to simulate molecular processes in cellular systems and several software packages have been developed for quantitative simulation of biochemical pathways [4]. This paper will demonstrate the importance of cell simulations as a tool in systems biology. It will discuss the abilities, successes, limitations and future prospects associated with current cell simulation software with focus on two particular projects, E-Cell and Virtual Cell. II. E-CELL The E-Cell project was initiated in 1996 at the ShonanFujisawa Campus of Keio University in Fujisawa, Japan. The aim was to directly address the challenging task of whole-cell modeling [3]. E-Cell is essentially a computer software environment for modeling and simulation of the cell. It is a generic object-oriented environment for simulating molecular processes in user-definable models. It is equipped with graphical interfaces that permit observation and interaction, thus making the simulation more user friendly [5]. A. How E-Cell Works The system is, in essence, a rule-based simulation system. It is written entirely in C++, an object-oriented programming language and it runs on Linux operating system [5]. The model is composed of three lists loaded at runtime: the substance list, the rule list and the system list. The substance list defines all objects that make up the cell and the culture medium. The rule list defines all of the reactions that can occur in the cell. The system list defines functional and/or spatial structure of the cell and its surrounding environment. At every time frame, the state of the cell is expressed as a list of concentration values of all substances that are within the cell, along with along with global values for cell volume, pH and temperature. The simulator engine computes all of the functions that are defined in the reaction rule list, thus generating the next state in time. A number of sample models are provided with the system. The user can, however, create user-defined models in addition to these by writing original substance and rule lists. The graphical interfaces make the program user friendly by allowing observation and interaction throughout the simulation process [4]. 2 B. Current Models Using E-Cell The first virtual self-surviving cell (SSC) model has been developed at Keio University using E-Cell. This was done using the genome sequence of the micro-organism Mycoplasma genitalium. M. genitalium has the smallest number of genes (approximately 480) of all organisms currently known, and its genomic sequences have been published. Its small genome makes it an ideal candidate for whole-cell modeling. It has been demonstrated through extensive gene-knock-out studies by the Institute for Genomic Research (TIGR) that many of the 480 genes are not always necessary for the survival of the organism. Therefore, collaboratively with TIGR, Keio University constructed the first hypothetical virtual cell using 127 genes which are necessary for the cell’s survival and maintenance of homeostasis [3]. Fig. 1 below shows a diagram of the metabolism overview of the virtual SSC model discussed in [3] and [4]. This model takes up glucose into the cytoplasm, metabolizes the glucose through the glycolysis pathway. This produces ATP as an energy source, which is consumed mainly for protein synthesis. The 127 genes are transcribed by RNA polymerase into mRNAs, and then translated into proteins by the ribosome. The cell must constantly produce protein to sustain life, since it has been modeled to degrade spontaneously over time. The cell’s membrane is also modeled to degrade, thus, the cell has a phospholipid biosynthesis pathway for biosynthesis of the cell membrane. A constant supply of ATP is needed for protein and membrane synthesis, and thus glucose is essential for the cell to survive [3]. ATP Glucose Phospholipid Bilayer Lactate Glycolysis Lipid Biosynthesis Phospholipids Fatty Acids ATP Degradation 127 Genes Glycerol Proteins Transcription mRNA of experimental data have already been collected about them [3]. The E-Cell model was developed for the human erythrocyte by defining reaction rules for all the different metabolic pathways based on previous erythrocyte models. All parameters and kinetic equations used in constructing the model were obtained from previously published experimental data. Simulations have shown that when the E-Cell erythrocyte model reaches a steady-state, quantities of intermediate metabolites inside the virtual cell are comparable with experimental data obtained from living erythrocytes. The current erythrocyte model is being extended and improved for more accurate simulations that account factors such as pressure, pH and variable cell volume [3]. C. Future Prospects In addition to the ‘virtual self-surviving cell’ and the ‘human erythrocyte model’, E-Cell is currently being used to construct other models. These include a ‘mitochondria model’ and a ‘signal transduction’ model for the chemotaxis of the E. Coli bacterium [3]. A major problem associated with constructing large-scale cell models is the lack of quantitative data. Most of the available biological knowledge is qualitative (functions of genes, pathways, protein interactions). For simulation, it is necessary to have quantitative data (concentrations of metabolites and enzymes, flux rates, kinetic parameters, dissociation constants). A major barrier that must continually be crossed in order to have improvements towards cell simulation is the development of better high-throughput technologies for measurement of inner-cellular metabolites. Large amounts of data for a variety of cell states can then be collected to construct quantitative models, and the models can be refined through an iterative process until the simulation results match the data [3]. Also despite the developments in E-Cell software itself, there are still several difficulties with regards to simulating realistic models. A newer version of E-Cell is being developed to address these issues. It will be capable of simultaneously running various different algorithms in a single simulation. Suitable algorithms will be used for different submodels of various cellular processes at different levels of abstraction and in different time-scales [5]. Translation ATP tRNA rRNA Fig. 1. The virtual self-surviving cell model. The minimal cell has 127 genes, sufficient to maintain protein and membrane structure. Obviously, the SSC model is hypothetical and no such cell exists in nature. Thus, E-Cell has also been used to model living cells so that simulation results can be evaluated. Human erythrocytes (red blood cells) were chosen for the model because of their limited intracellular metabolism and because they do not replicate, transcribe or translate genes. This model can be compared with real red blood cells since vast amounts III. VIRTUAL CELL The ‘Virtual Cell’ is another example of a generic software environment that is being developed at the National Resource for Cell Analysis and Modeling for cell biological research. This software is freely accessible to all members of the scientific community and it is designed to be used by a wide range of scientists from experimental cell biologists to theoretical biophysicists. This software provides an integrated framework within which models of cell biological processes can be created based on both experimental data and purely theoretical assumptions [6]. 3 A. How Virtual Cell Works Firstly, various experimentally determined data is used to construct the cell model. The data includes the identity of molecules, their reactions and transport properties, where they are compartmentalized within the cell, and the topological organization of the compartments. This takes care of the physiology of the cellular process being modeled. For spatial simulations, the various compartments have to be mapped to the appropriate geometries. This scheme allows the same physiology to be reused with different geometries, thus facilitating ready adaptation and modification of the models. After the physiology is mapped to the geometry and initial and boundary conditions are specified, the model is fully defined. The framework automatically converts the biological mechanisms into a mathematical system. If desired, the mathematical system can be further refined and edited. Then, after choosing appropriate time steps for the simulation, the model is sent to the appropriate solver and a simulation is generated. Data visualization resources are provided for ease of use when navigating through the enormous simulation data sets [2]. B. Current Models Using Virtual Cell Nucleocytoplasmic transport, and in particular the transport of the small GTPase Ran, serves as a good example of an application of the Virtual Cell to quantitative cell biology. GTPases are a large family of enzymes that can bind and hydrolyze GTP (guanosine triphosphate - a chemical compound essential to signal transduction in living cells). Ran is a small GTPase that is required for protein import, mRNA export, and the maintenance of nuclear structures. The Virtual Cell model was able to simulate both qualitatively and quantitatively Ran transport over a range of different conditions. It presented the first estimates for the amount of steady-state flux of Ran across the nuclear envelope. This in turn provided a lower estimate for the value of the total nucleocytoplasmic flux (approximately 20 million macromolecules per cell per minute). It also predicted that a very high gradient of Ran-GTP exists between the nucleus and the cytoplasm. These predictions have been verified experimentally using a fluorescence energy transfer (FRET) biosensor [7]. The Virtual Cell has also been used to simulate calcium dynamics in a neuronal cell. In certain neuronal cells, the neurotransmitter bradykinin (BK) triggers IP3 (inositol 1,4,5trisphosphate) dependent calcium waves that consistently start in the neurite proximal to the soma and rapidly propagate in both directions. Using calcium imaging techniques, quantitative uncaging of microinjected IP3 and simulations from the Virtual Cell, it was found that IP3 levels build up in the neurite at a rate and to an extent much greater than in the soma. By coupling experiments and simulations involving BK, it was confirmed that the proximal segment of the neurite is the crucial region for response to a BK stimulus. This proximal segment is necessary and sufficient to initiate and propagate the calcium signal to other regions of the cell [2]. This study is a good representation of the interplay between experiment and modeling. The initial calcium increase was observed in the middle of the neurite, and spread bidirectionally to the soma and growth cone. This pattern was observed in all of the studied cells as long as they had the same characteristic neuronal morphology. A quantitative model of this process was constructed using Virtual Cell software to determine how all the individual components could interact to form the observed calcium wave. Data for this model was obtained from both prior literature and laboratory experiments. In several instances previously reported experiments had to be repeated when it was found that a model based on data from the literature was not correctly predicting the observed calcium dynamics. A very interest finding that came out of this iterative modeling and experiment process was that the observed calcium wave could only be reproduced if the density of calcium in the soma was about twice as high as its density in the neurite [2]. C. Future Prospects The version of Virtual Cell that is presently in use can handle a large range of modeling problems encompassing reactiondiffusion processes in arbitrary geometries. However, the system will need to be significantly enhanced for problems that require a changing geometry, such as cell migration or mitosis. Also, the current system can only treat some types of stochastic processes such as Brownian motion, directed particle mobility along microtubule tracks, and the reaction of individual particles with continuously distributed molecules. In situations where the number of interacting particles is too low for a continuous description, there is a need to expand the stochastic formulations to include the treatment of reactions between discrete molecular species. There is also a need to develop a discrete state treatment for models of single ionchannel currents and locations. The system architecture and the user interface will be adapted to fully accommodate stochastic models. In addition, Virtual Cell hopes to incorporate more efficient numerical algorithms that take advantage of massively parallel computer architectures. This feature, along with external database support, will allow the formation of larger scale biochemical network models within a high-resolution cellular geometry [2]. IV. CONCLUDING REMARKS Cell simulations are undeniably an important tool towards advancement in systems biology. With the rapid accumulation of biological data, it is becoming increasingly clear that it is no longer practical to try and understand the dynamic behaviour of various cellular processes through experimentation alone. Software simulation tools such as Virtual Cell and E-Cell are necessary to truly realize the full potential of biological data for elucidating cellular mechanisms. Clearly however, sophisticated cell simulations alone are not sufficient. It is also necessary to have continued improvement in laboratory techniques for obtaining accurate quantitative data, since such 4 data is necessary to generate a reliable model in the first place. Therefore, the use of in silico modeling of cellular phenomena in combination with laboratory experiments will lead to more reliable predictions and results than could be obtained by either method alone. Thus, the addition of software modeling tools to a cell biology laboratory will become as important as the microscope [7]. REFERENCES [1] [2] [3] [4] [5] [6] [7] H. Kitano, “Looking beyond the details: a rise in system-oriented approaches in genetics and molecular biology,” Current Genetics, (2002) 41: 1-10. L. Loew and J. Schaff, “The Virtual Cell: a software environment for computational cell biology,” TRENDS in Biotechnology, vol. 19, no. 10, Oct. 2001, pp. 401-406. M. Tomita, “Whole-cell simulation: a grand challenge of the 21st century,” TRENDS in Biotechnology, vol. 19, no. 6, June 2001, pp. 205210. M. Tomita, K. Hashimoto, K. Takahashi, T. Shimizu, Y. Matsuzaki, F. Miyoshi, K. Saito, S. Tanida, K. Yugi, J. Venter, C. Hutchison, “E-Cell: software environment for whole cell simulation,” Bioinformatics, vol. 15, no. 1, 1999, pp. 72-84. K. Takahashi, N. Ishikawa, Y. Sadamoto, H. Sasamoto, S. Ohta, A. Shiozawa, F. Miyoshi, Y. Naito, Y. Nakayama and M. Tomita, “E-Cell 2: Multi-platform E-Cell simulation system,” Bioinformatics, vol. 19, no. 13, 2003, pp. 1727-1729. Moraru, J. Schaff, B. Slepchenko and L. Loew, “The Virtual Cell An Integrated Modeling Environment for Experimental and Computational Cell Biology,” Ann. N.Y. Acad. Sci. 971: 595-596 (2002). B. Slepchenko, J. Schaff, I. Macara and L. Loew, “Quantitative cell biology with the Virtual Cell,” TRENDS in Cell Biology, vol. 13, no. 11, Nov. 2003, pp. 570-576.