Realistic modeling of biological systems Synopsis Realistically modeling of biological systems is becoming feasible thanks to recent progress in experimental and computational methods. On May 1st-May 5th 2005, a workshop was held in Mezpe Hayamim, Israel, to explore the motivation, meaning and methodologies for realistic modeling of biological systems. In addition, the goals and measures of success have been discussed and debated. An ad hoc definition for realistic models of biological systems (RMBSs) is “a comprehensive model of a complex biological system that can be interrogated to reproduce and predict the behavior of the system under realistic conditions”. Most definitions of scientific models require them to be simpler than the system under study, offering abstractions that come from understanding. A debated requirement was that models, just like Popperian hypothesis, should produce falsifiable predictions that stand up to tests. Some of the biologists that participated in the workshop proposed a special type of models, communicative models. Such models will capture the knowledge of different researchers in different fields to a concise representation, which may have little predictive power if that knowledge is still too partial. The goals of communicative realistic models is to describe existing knowledge in a unified structure, and to create a frameworks which allows, through interacting with the model, to develop new insights, to highlight contradictions, or to point to missing information. It is important to note that this type of realistic model was considered acceptable or useful only to some participants. The motivations for developing RMBMs that were discussed included: - Integrating knowledge and data about a system, bridging disciplines and turfs - Testing the existing understanding of the underlying mechanisms by the ability of the model to reproduce the emerging properties of the system. - Predicting the behaviors of the system to suggest experiments which will resolve between competing hypothesis. - Explaining what features of the underlying system are most important in generating the emerging properties of the system. Some of the parameters of examining the performance of a realistic model are: - Completeness: the fraction of known behaviors and structures of the system that are included by the model. - Usefulness: the ability to utilize the model to draw conclusions or make decisions (such as experimental design) based on the model. - Predictiveness: the ability of the model to assist in formulating predictions that are successfully validated. - Depth: the ability of the model to describe emerging macroscopic properties of the system across biological organization levels (e.g. molecular, cellular, tissue) based on microscopic elements and rules. A topic of major discussion was the testing of the realism of a system. It was largely fueled by David Harel’s proposal for a “grand challenge”, in the form of “The Extended Turing Test”. In this test, model is realistic if “one versed in the field” cannot tell the model from the real system. The discussion on this test pointed to the need to define the legalistic aspects of such as a test, as well as the need to establish mechanisms which will prevent failure simply on the basis of the difference in the way in which the answers are communicated (e.g. probing the state of the model being faster than running a test on the real system). With a more immediately achievable goals in mind, Ronan Sleep presented a roadmap challenge based on the study of specific systems involving gastrulation and other developmental processes. Sleep’s roadmap starts with… Several models were discussed, demonstrated the place of realistic modeling along the traditional levels of organization in biology. These included modeling metabolism with realistic modeling of membranes and the cytosol (Gordon Brodick), the lytic/lysonegic switch in lambda phage (Anastasia Yartseva), C. elegans Vulval development (Jane Hubbard and Michael Stern), Different aspects of the immune cell generation (Howard Petrie, Sol Efroni and Naaman Kam) … [I included here some of the summaries I made during the meeting that ended up more mature. I will appreciate everyones sending me one or two paragraphs that capture the essence of the message they had in terms of the summary – e.r.] Sorin Solomon presented a model which much like the famous simulation Life, allows simple automata to reproduce or die on a grid. In Sorin’s system, two types of creatures coexist on the grid. Breeders can reproduce if they meet Catalyzers, or die with a given probability at any cycle, while catalyzers do not die or reproduce. Both catalyzers and breeders diffuse randomly on the grid. This very simple system produced two very interesting observations about the nature of emerging properties in complex systems. First, modeling the fate of the system with differential equations gave false predictions. According to such solutions, which average out the behavior of all the elements in the system, if the average “death” rate is bigger than the average “birth” rate, reproducers will gradually go extinct. However, if you run the simulation you see that random nuclei of high density around a reproducer can breed faster than they die; as a result, for most starting conditions the breeders took over the entire grid. The second emerging property of this system is that a cloud of breeders seemed to follow the catalyzers. While none of the individual breeders had any directional interaction with the catalyzers, their density followed it. Jane Hubberd and XXX presented models of vulval development in the warm C. elegans. Prof. Hubberd explained the special challenge and opportunity this warm poses for modeling. Being transparent, fast to breed, and with a relatively simple anatomy, this warm is one of the best understood models of development. Decades of research provided us with a clear documentation of cell fates1 from the zygote and all the way to the mature An interesting discussion surrounded the use of the term “fate”. Prof. Irun Cohen argued that it conveys a sense of predestined processed, while in reality the process is driven by a series of “here and now” decisions. Irun’s expressed concern that the use term may be mislead for its English meaning. The 1 adult with its 1000 or so cells. Furthermore, it is rather simple to silence specific genes in the warm2, providing an easy way to test the effects of perturbation. In addition, laser ablation can be used to explore the effect of destroying specific cells or structures during development. Michael Stern presented a model for the processes that determine vulval positioning. In this processes, a cascade of intercellular signals lead to the differentiation of a set of cells into two cell types, one which will later die to form the vulval cavity, and one which will form its walls. A simulation of this process was presented, based on Logical State Charts (LSCs), which captures the formation of the early vulva from simple rules that each of the cells follow. Ohad Parnes discussed models. He argues that in biomedical research, there is no real difference between experimental systems from model systems. All experimental systems are models: we are not studying nature directly, but try to describe “agents”, a concept first introduced by Muller, that can explain the behavior of the system in a cause-andeffect way. For example, Schwan showed that if you analyze the process of stomach digestion required an agent later found to be pepsin. Similarly, fermentation was shown to result from a specific agent. Through this model, Schwan identified cells as the agent. In the 1970s and 80s, the agent started weakening. Examples: The clonal selection theory cannot explain the behavior of the system from the individual “agents”. In epidemiology, given the same bacteria, different societies will develop different disease patterns. Patterns from the substrates – people – and the agent – the bacteria. In the 1940s and 1950s, system theoretical approaches tried to identify high level rules about the behaviors of all biological systems. The idea was to define rules that all systems follow, and than drill down and understand how the systems are realization this rule. Agent based programming / agent based programming: the agents entering biological models from the field of computer modeling are different from the “old” agent model. Each agent is a much more complex model. It has goals, it has bounded rationality, etc. This was explicitly not allowed in physiology in the 18 century science. In modern modeling, goals behavior and rationality are introduced into physiological systems. The discussion in Q&A was around where are we heading. There was a discussion of replacing the agent with new agents. The computational agents were mentioned as a possible emerging new type of agent. Yoram Luzon mentioned the difficulty of living with more 4-5 elements in a system. Carl Schaefer presented the Pathway Interaction Database (PID) (http://cmap.nci.nih.gov/PW) as a prototype database of metabolic and signaling interactions extracted from the representations of pathways available from KEGG (http://www.genome.jp/kegg/) and BioCarta (http://www.biocarta.com). The database currently contains 4207 interactions from 85 human metabolic pathways and 3064 interactions from 259 human signaling pathways. In the PID data model, there are four developmental biologists acknowledged the concern, but explained that it is too central to embryogenesis to be replaced. 2 Gene silencing in C. elegans can be achieved by creating transgenic E. coli bacteria that express a specific siRNA designed to target specific genes or groups of genes, and feeding the warms with this bacteria. C. elegans warms naturally feed on bacteria. interaction types (reaction, modification, transcription, and translocation), four molecule types (protein, complex, RNA, and compound), and four role types (input, output, agent, and inhibitor). Post-translational modifications and cellular locations are specified by labels on uses of molecules in interactions. This scheme models a few simple relations -cause/effect, producer/consumer -- in a way that supports computation across the entire set of interactions. One interesting exploratory use of the PID data is the construction of interaction profiles of phenotypes. An interaction profile is analogous to a gene expression profile. A gene expression profile is a set of pairs, each pair consisting of a gene id and a value of “up” or “down”. Similarly, an interaction profile is a set of pairs, each pair containing an interaction id and a value of “on” or “off”. In most cases, the proteomics data needed to construct a profile of signaling interactions is simply not available. However, one can use gene expression data to specify an initial state for the set of signaling interactions in the PID, and then apply a set of simple rules which interpret the cause/effect relation to infer which interactions are active for a gene expression dataset. Using this approach, one can infer that a given posttranslationally-modified form of a gene product is present in one sample but absent in another, even though the gene is equally expressed in both samples We have applied this method to data from 18 brain tumor (glioblastoma multiforme) samples and 7 normal brain samples from NCI’s REMBRANDT project (http://rembrandt.nci.nih.gov), computing, for each tumor sample, the set of interactions that are active in cancer but not in normal. The interactions unique to a given tumor sample typically aggregate into sets of connected graphs. The size of these graphs varies from 3 interactions (the minimum size for this analysis) to 35 interactions in the largest graph. While some of theses graphs are unique to a single tumor sample (39), other graphs are shared by several tumor samples; one graph is present in 12 of the 18 samples. Furthermore, in some cases one graph may include all the interactions in another graph. Using this relation of inclusion, we create a partial ordering of graphs. This ordering, in turn, implies an ordering of the tumor samples containing these graphs, which might reflect a progression in the activation of connected interactions that are not found to be active in normal samples