1 Systems Biology and Anticancer Drug Discovery Wen Jiang. Author, Student Member, IEEE Abstract - The traditional anticancer drug discovery process is inefficiency due to the lack of predictiveness of the models used. Systems biology will provide a better understanding of the cellular activities and pathways related to a certain disease such as cancer. This understanding will enable researchers to create predictive computer models to simulate in vivo organisms, for instance, human colon cancer cell. These models are based on the integration of genomic, proteomic, and metabolomic data. They can be used to optimize drug leads, in which it will maximize the effectiveness against cancer cells with minimal or no side effects on normal cells. Keyword – bioinformatic, in silico, systems biology. I. INTRODUCTION Historically, the drug discovery process for anticancer drugs has faced many challenges. The most difficult problem to deal with is the fact that the models used in the laboratory to predict drug responses are not predictive at all. Most of today’s models used to predict drug response are made of laboratory animals that carry transplanted human tumor cells. These models are called the xenografts [1]. However, the drug handling ability is different between humans and the xenografts. A drug that is effective in the animal might not be so effective within the human body; on the other hand, there are many effective cancer drugs been discarded because of the non-effectiveness in the animal model. Largely due to the above mentioned problem, of the thousands of anticancer drugs discovered to be effective in cell culture or animal models, only 39 are approved by the Food and Drug Administration for exclusive use in chemotherapy. Therefore, the main focus for better and more efficient anticancer drug discovery is to obtain an accurate model that can predict the human response to the drug. Currently, extensive research efforts have been conducted in the area of better understanding the genetics of cells. The Human Genome Project for instance, has succeeded with the decoding of the human genome sequence in 15 February 2001 [3]. Many other projects involving the studies of protein interactions and the decoding of three-dimensional structure of proteins, which are coded by a genome, are also underway. The purpose of these studies is to understand all components and interactions of a complex functional system and their underlying dynamics [4]. By gaining this knowledge about the system, a systems biology model can be created. Such model uses computer-aided modelling techniques, which can model all molecular interactions with a cell or even complex system as an integrated computational process. However, the creation of the model is a complicated process, it requires the integration of genomics, proteomics and bioinformatics to create a whole system view of complex biological entities [2]. At the same time, better information processing technology, database and high-speed computation technology are needed to analyze complex systems such as human body. In this report, a brief introduction of the concepts of systems biology will be given. Such concepts include what is systems biology and what are the methods of creating system models. It will be followed by the detailed discussion of systems biology approach to cancer treatments, such as drug discovery and therapy. The main focus for this part will be on how systems biology can generate more accurate models that can predict the effectiveness and related side effects of drugs. The final part of the report will focus on the current progress made within the field. Examples such as the development of human colon cell model will be mentioned. II. BACKGROUND The concept of systems biology has been around for sometime, it is only until recently that researchers begin to look at biological entities at a systems level. This is mainly due to the availability of high-throughput measurements of DNA, RNA and proteins [5]. Because of the fact that the modeling and simulation of biological systems which is the core of systems biology require quantitative data at all hierarchical levels, therefore, massive data is needed. Information related to the DNA, mRNA, proteins, protein interactions, information pathways, networks, cells, tissues, organisms or even entire populations are required [6]. Also, this information should contain external parameters related such as the concentration and absolute number of molecules within each cell. Without the high-throughput measurement techniques available today, such information capture would be impossible. Computer models for biological systems can be generated based on the available information by using two approaches: chemical kinetic models and discrete circuit models. Chemical kinetic models attempt to represent cellular processes as a system of chemical equations. The reaction process can be represented mathematically as differential equations, where the change in concentrations of reactants and post-reaction products are recorded based on the reaction rate. Some other biological processes such as transcription or translation operate in a random fashion, thus stochastic 2 relations should be used for the model [6]. A kinetic model of the lambda phage is shown in Figure 1. Figure 1: kinetic model for the lambda phage circuit. The discrete circuit model does not represent biological process as a continuous event. It models discrete event as feedback loops. Such model is made of a network consisting of nodes and directed edges between the nodes. The nodes stand for the quantity of a certain molecule, and the edge is the effect of the level of a given node on the neighboring nodes. In a simplified model, the node assumes one of two discrete states, indicating the presence or absence of a certain molecule. For the initial state, a start value was given to each node, and subsequent values can be determined by the respective functions of the nodes. The network state of all nodes will evolve over a series of discrete time steps, where the next step is based on the condition determined from the current step. Discrete circuit model is a much simplified compare to the real life situation, especially for the binary value assigned to the state of each node. This is not very realistic, since real biological systems often can have more than two states [4]. III. SYSTEMS BIOLOGY APPROACH TO ANTICANCER DRUG DISCOVERY The traditional drug discovery process is a linear process based on the sequential approaches of biology and chemistry. The limitation of such process is the vast screening of randomized chemical libraries against a small number of biological targets [7]. This approach works for single target, one drug system. For multifactorial diseases, where multiple targets or pathways have to be affected for successful treatment, this approach has limited success. Systems biology approach on the other hand will target a broader range of biological structure, thereby creating the potential for discovery of drugs that are effective to multiple diseases by targeting common pathways implicated in pathogenesis [8]. Cancer, and other genetic or metabolic disorders often are caused by complex multi-molecular interactions that cannot be explained by an alteration in a single gene. In order to develop drugs that are effective, the first step is to identify all the molecular targets with a connection to the disease. This can be done by using systems biology to identify novel molecular targets or new uses for the existing molecular targets available. Such targets include mutant proteins, which have not previously identified and therefore have not established connections with the disease. Next, it is necessary to decipher complex inter- and intra-cellular signaling relationships, which would provide information of the entire signaling networks. With all these data known, a system model can be created to simulate the effectiveness and side effects of a certain drug. The identification of novel molecular targets or new uses for existing targets can be achieved by the integrative systems biology approach. For this approach, the proteomic and metabolomic data acquisition and analysis are performed in parallel, obtaining a link of the particular protein to the interested disease. An example of the new uses of existing targets is the discovery and successful development of sildenafil (ViagraTM; Pfizer) [9]. With the identification of these targets, a more accurate model can be generated to predict the efficiency of certain drugs or lead compound. The cellular signaling relationship is also important for the purpose of drug discovery. By defining the entire signaling networks, the researcher can generate models that can predict the toxicity and side effects of the drugs. This will allow the development of efficacious and safe therapeutic agents that can focus on the most appropriate region of a signaling cascade [8]. The computer models currently been generated rely extensively on the information provided from the bioinformatic database. However, the amount of data is still not detailed enough to be able to model a whole cell. The approaches used for computer modeling today is to set constraints on cellular activities, and generating a solution space for predicting the behaviors of cells, as shown in Figure 2. If all the constraints for cellular process are known, the final solution space reduces to a single solution. For systems biology, the goal is to obtain as much information as possible. Therefore reduce the size of the solution space; hence make the model more predictable [10]. 3 Figure 2: Constraining possible behaviors. Because biological information is incomplete, it is necessary to take into account the fact that cells are subject to certain constraints that limit their possible behaviors. By imposing these constraints in a model, one can then determine what is possible and what is not, and determine how a cell is likely to behave [10]. The building of computer models for simulation of complex biological systems is an iterative process. In silico organism models will be constructed to represent their in vivo counterparts. The models will be able to provide interpretive and predictive capabilities. However, due to the incomplete knowledge of true cell behaviors, the in silico models often are missing features comparing to the in vivo organisms. Therefore, experimental testable hypotheses must be formulated based on the in silico analysis, and perform the experiment to update the model as shown in Figure 3. Several iteration processes is needed to make the model more accurate and predictive. For anticancer drug discovery, such models can be used to predict the effectiveness of the drug against its target or multiple targets, and the potential undesired side effects. This allows high through-put screening of new drugs without going through the time consuming laboratory animal experiments. The model can also be used for other non-drug therapies such as gene therapy for cancer treatments. Figure 3: Iterative in silico model building in biology involves the formulation of experimentally testable hypotheses based on the in silico analysis, collection of experimental data, and subsequent refinement of the models based on these data. During the Beyond Genome conference in Boston, 2002, Gene Network Sciences (GNS) Inc. unveiled the world’s first in silico model of a human colon cancer cell. The in silico colon cancer cell contains over 2,000 variables, representing the activities of more than 500 genes and proteins. The model details connecting signal transduction and gene expression networks involved in human cell growth, and contains about one-third of all the targets for current cancer drugs such as BCL-2, Ras, IKK and p53. [11]. The GNS model can speed up the drug discovery process by identifying high value drug targets, testing the efficacy of lead compounds, and running virtual clinical trials. Currently, the model has made predictions on targets that sensitize cancer cells towards apoptosis and secondary targets that can be used in combination to lead to significant cell death in cancer cells, but not in normal cells. This will enable researchers to develop drugs that can look for specific targets or combination of targets related to the disease can increase the effectiveness and reduce the side effects [12]. The model still has limitations since its database is limited. According to the company, by the end of next year, the model will contain around 5000 genes and proteins, and incorporate all known regulatory pathway information about the cell and every known drug target for cancer. This will allow more accurate predictions for the purpose of anticancer drug discovery. Any progress beyond this point will require the advances in cellular genomics, proteomics, and metabolomic data gathering. As mentioned before, systems biology approach is needed to integrate various disciplines to generate new knowledge that cannot be obtained by “isolated methods”. GNS is not the only company in the area of developing in silico models for biological systems, company and organizations such as Physiome Sciences and University of Connecticut are also developing novel tools to simulate cell behaviors [7]. Pharmaceutical companies are also joining the field, and use the modeling tools to validate targets, understand biological mechanisms, or optimize drug leads. IV. CURRENT PROGRESS IN MODELING V. CONCLUSION Many progresses have been made in the area of in silico modeling of biological systems. Projects such as the E-Cell led by Prof. Masaru Tomita of the Keio University created a virtual cell on the computer. The E-Cell allows the simulation of the metabolism of a single cell organism with 127 genes, the simplest genome [12]. The simulation of much more complex systems, such as the human body, which consists of 60 trillion cells is extremely difficult. In order to achieve simulation of such complex organisms, not only a more detailed understanding of fundamental cellular activities is needed, the hardware and software to support such simulation is also required. Systems biology opens up new ways for high through-put drug screening by understanding the fundamentals of cellular activities and utilizing predictive models based on those understanding. However, at present time, systems biology research in the area of anticancer drug discovery is still in the early development stage. This is mainly due to the lack of knowledge of many of the potential targets that causes cancer. Also, the development of predictive models is limited to single cell organism only. For complex multi-cell organisms, such as human, not only a better and through understanding of all components, their interactions, related parameters are required. The software and hardware that can support such massive simulation also needs to be substantially upgraded. As a result, systems biology is really a multi-disciplinary field 4 that integrates the field of biology, engineering, computer science, and many other areas of study together. The progress in every one of the above areas is essential for the success of systems biology as a new and emerging field of study. REFERENCES [1] T. Gura, “Systems for Identifying New Drugs Are Often Faulty”, Science., Volume 278, Number 5340, pp. 1041-1042, Nov 1997. [2] J. Fox, “What is Bioinformatics?”, Bioteach Online Journal, Available: http://www.bioteach.ubc.ca [3] G. Venter, “ The sequence of the human genome”, Science., Vol 291, pp. 1304-1451, 2001 [4] T. Reib, “Systems Biology”, Heidelberg, Germany: Druckerei Hornin Publishing, 2002. [5] C. Henry. “Systems Biology”, Chemical and Engineering News, Vol 81, pp 45-55, 2003. [6] Z. Oltivai, and A. Barabasi, “Life’s Complexity Pyramid”, Science., Vol 298, pp 763-764, 2002. [7] G. Wess, “How to escape the bottleneck of medical chemistry”, Drug Discov. Today., Vol 7, pp 533-535, 2002. [8] E. Davidov, “Advancing drug discovery through systems biology”, Drug Discovery Today., Vol 8, pp175-183, Feb 2003. [9] N. K. Terret, “Sildenafil (Viagra). A potent and selective inhibitor of type 5 cGMP phosphodiesterase with utility for the treatment of male erectile dysfunction”. Biorg. Med. Chem. Lett., Vol 6, pp 1819, 1996. [10] B. Palsson, “The Challenges of in silico Biology”, Nature America, Vol 18, pp 1147-1150, 2000. [11] C. Hill, “Data-Driven Computer Model of Human Colon Cancer Cell,” presented at the Beyond Genome Conference, San Diego, California, June, 2002. [12] Y. Miyamoto, “Genome Technology and Electronics”, OKI Technical Review, Vol 70, pp 82-85, 2003.