Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at Chicago Chicago, IL 60607-7053 dasgupta@cs.uic.edu What is “systems biology” in one sentence ? study to unravel and conceptualize dynamic processes, feedback control loops and signal processing mechanisms underlying life ISBRA 2012 Cellular Networks • A single cell by itself is complex enough • Various technologies have facilitated the monitoring of expression of genes and activities of proteins • Difficult to find the causal relations and overall structure of the network http://www.nyas.org/ebriefreps/ebrief/000534/images/mendes2.gif ISBRA 2012 Cellular Networks Genes and gene products interact on several levels, e.g.: • Genes regulate each other’s expression as part of gene regulatory networks – transcription factors can activate or inhibit the transcription of genes to give mRNAs – these transcription factors are themselves products of genes • Protein-protein interaction networks – proteins can participate in diverse post-translational interactions that lead to modified protein functions or to formation of protein complexes that have new roles • Different levels of interactions are integrated – e.g., presence of an external signal triggers a cascade of interactions that involves biochemical reactions, protein-protein interactions and transcriptional regulation ISBRA 2012 Cellular networks • cellular interaction maps only represent a network of possibilities, and not all edges are present and active in vivo in a given condition or in a given cellular location • only an integration of time-dependent interaction and activity information will be able to give the correct dynamical picture of a cellular network ISBRA 2012 Modeling problem • interaction data produced by the biologist in the form of a diagram (e.g., some type of labeled digraph) • wish to pose questions about the behavior (dynamics) of such a network – essential to provide a precise mathematical formulation of its dynamics, and specifically how the state of each node depends on the state of the nodes interacting with it ISBRA 2012 Models • • • • • discrete, continuous and hybrid models their inter-relationships, powers and limitations computational complexity and algorithmic issues biological implications and validations fascinating interplay between several areas such as: – biology – control theory – discrete mathematics and computer science ISBRA 2012 System dynamics • state variables – continuous – discrete (e.g., small number of quantitative states) • time variables – continuous (e.g., partial differential equation, delay equations) – discrete (difference equations, quantized descriptions of continuous variables) • deterministic or probabilistic nature of the model • hybrid models – combines continuous and discrete time-scales and/or – combines continuous and discrete time variables ISBRA 2012 Continuous-state dynamics Differential equation (continuous-time) Difference equation (discrete-time) ISBRA 2012 Examples of other models Boolean x1, x2, x3 {0.1} Boolean feedforward Signal Transduction ISBRA 2012 Reverse engineering of models Given – partial knowledge about the process/network – access to suitable biological experiments How to gain more knowledge about the model ? – effective use of resources (time, cost) ISBRA 2012 Reverse engineering Process of backward reasoning, requiring careful observation of inputs and outputs, to elucidate the structure of the system ISBRA 2012 http://www.computerworld.com/computerworld/records/images/story/46Reverse-engineering.gif Ingredients for reverse engineering • Mathematical model to be reverse engineered – e.g., differential equation model • Biological experiments available, e.g., – perturbation experiments – gene expression measurements ISBRA 2012 Many reverse engineering approaches are possible I will discuss two types of approaches: – “hitting set” based combinatorial approaches – modular response analysis (MRA) approach ISBRA 2012 Reverse Engineering of Networks Via Modular Response Analysis Method Ingredients for reverse engineering via modular response analysis approach • Mathematical models – differential equation model • Biological experiments available – perturbation experiments ISBRA 2012 Differential Equation Model state variables evolve by (unknown) ordinary differential equations dx = f(x,t) dt dx1 = f1 (x1 , x 2 ,…, xn ,p1 ,p 2 ,…,p m ) dt dxn = fn (x1 , x 2 ,…, xn ,p1 ,p 2 ,…,p m ) dt x = (x1(t),...,xn(t)) state variables over time t measurable (e.g., activity levels of proteins) p = (p1,...,pm) parameters that can be manipulated f(x*,p*)=0 p* “wild-type” (i.e., normal) condition of p x* corresponding steady-state condition ISBRA 2012 settings for modular response analysis method – do not know f – but, prior information of the following type is available • parameter pj does not effect variables xi (i.e., fi /pj ≡ 0 or not) Kholodenko, Kiyatkin, Bruggeman, Sontag, Westerhoff and Hoek, PNAS, 2002 ISBRA 2012 Experimental protocols (perturbation experiments) • perturb one parameter, say pk • for perturbed p, measure steady state vector x = (p) – let the system relax to steady state – measure xi (western blots, microarrys etc.) • estimate n “sensitivities”: b ij i * 1 (p ) * ( i ( p* p j e j ) i ( p* )) for i = 1, 2,…,n p j pj - p j where ej is the jth canonical basis vector ISBRA 2012 Modeling Goal Modeling goal can be at different levels 1. 2. 3. 4. A Topology of connections only B 9.3 + 1.2 4.8 + Direction of the relationship 2.1 + Information about stimulatory or inhibitory effects Strength of relationship C 5.3 - ISBRA 2012 D Goal of MRA approach Obtain information about the sign of fi/xj(x,p) e.g., if fi/xj 0, then xj has a positive (catalytic) effect on the formation of xi ISBRA 2012 In a nutshell after some combinatorics and linear algebra one can quantify the additional prior knowledge necessary to reach the goal Kholodenko, Kiyatkin, Bruggeman, Sontag, Westerhoff and Hoek, PNAS, 2002 Bermen, DasGupta and Sontag, Discrete Applied Math, 2007 Berman, DasGupta and Sontag, Annals of NYAS, 2007 ISBRA 2012 But, assuming (near)-sufficient prior information • how to determine a minimum or near-minimum number of perturbation experiments that will work? This now becomes a algorithmic/complexity issue... ISBRA 2012 After some effort, one can see that designing minimal sets of experiments leads to the set multi-cover problem ISBRA 2012 In our biological application context, our set-multicover algorithm provides a set of suggested experiments such that # of experiments ≈ minimum possible ISBRA 2012 Modular Response Analysis for Differential Equations model Combinatorial Algorithms (randomized) Linear Algebraic formulation Combinatorial formulation Selection of appropriate perturbation experiments ISBRA 2012 Overall high-level picture Experimental validation of MRA Method See the paper: S. D. M. Santos, P. J. Verveer, P. I. H. Bastiaens, Growth factor-induced MAPK network topology shapes Erk response determining PC-12 cell fate Nature Cell Biology 9, 324 - 330 (2007) • MAPK pathway involving proteins Raf, Mek and Erk is activated through receptor tyrosine kinases TrkA and epidermal growth factor receptor (EGFR) by two different stimuli, NGF (neuronal-) or EGF (epidermal growth factor) • MRA method was applied to determine the MAPK network architecture in the context of NGF and EGF stimulations ISBRA 2012 Reverse Engineering of Networks Via Hitting-set based (combinatorial) Method “Hitting set” based combinatorial approaches topology of interconnection network hitting set steady state profiles of perturbations of the network multi-hitting set introduce redundancy hitting set expression data representing state transition measurement for wildtype and perturbation data introduce redundancy ISBRA 2012 topology of interconnection network multi-hitting set Basic idea behind the hitting-set based approaches x5 changes so does x1, x3, x4 at least one of {x1,x3,x4} must influence x5 minimal dependency (hitting set problem) {x1,x2} which variables influence x5 ? {x1} {x2,x3,x4} {x1} {x1,x3,x4} {x1,x3} build dependency information over all successive time steps ISBRA 2012 Why construct “minimal” dependency ? Occam's razor entia non sunt multiplicanda praeter necessitatem (entities must not be multiplied beyond necessity) However, biological networks may be redundant: e.g. – G. Tononi, O. Sporns, G. M. Edelman, PNAS, 1999 – R. Albert et al., Physical Review E, 2011 How can we introduce redundancy if necessary ? ISBRA 2012 How can we introduce redundancy if necessary ? First idea: add random extra dependencies (edges) not good, these edges may not be supported by given data Better idea: modify hitting set to “multi-hitting set” {x1,x3,x4} previously: select at least 1 now: select at least 2 (in general, some r) ISBRA 2012 Evaluation of performance of reverse engineering Methods Reverse-engineering methods are ill-posed, i.e., their solution is not unique – existence of measurement error – not all molecular species involved in a given analyzed phenomenon are included in the construction of a network • i.e., existence of hidden variables Two possible ways for evaluation: Experimental testing of predictions: after a model has been inferred, newly found interactions or predictions can be tested experimentally Benchmarking testing: measure how “accurate” the method of our interest is in recovering a known (“gold standard”) network ISBRA 2012 Evaluation of performance of reverse engineering Methods Metrics for accuracy for benchmark testing Measurements: – – – – correct interactions inferred (true positives, TP) incorrect interactions inferred (false positives, FP) correct non-interactions inferred (true negatives, TN) incorrect non-interactions inferred (false negatives FN) Metrics – recall or true positive rate TPR = TP TP +FN FP – false positive rate FPR = FP + TN – TP + TN – accuracy ACC = total possible interactions – precision or positive predictive value PPV = TP FP + TP ISBRA 2012 Two published method based on hitting set approach (A) Ideker, Thorsson, Karp, PSB (2000) First step (network inference): estimate a set of Boolean networks consistent with an observed set of steadystate gene expression profiles, each generated from a different perturbation to the genetic network Second step (optimization): use an entropy-based approach to select an additional perturbation experiment to perform a model selection from the set of predicted Boolean networks (B) Jarrah, Laubenbacher, Stigler, Stillman, Adv. in Applied Mathematics (2007) Attempts to infer the most likely causal relationships among network elements from gene expression data For other published results, see, for example: Krupa, Journal of Theoretical Biology (2002) ISBRA 2012 Comparative analysis (via benchmark testing) of two approaches by (A) Ideker, Thorsson, Karp (B) Jarrah, Laubenbacher, Stigler, Stillman Two gold standard networks: a. Segment polarity network of Drosophila melanogaster (fruit fly): – – – – – last step in the hierarchical cascade of gene families initiating the segmented body of the fruit fly genes of this network include – engrailed (en) – wingless (wg), – hedgehog (hh) – patched (ptc) – cubitus interruptus (ci) and – sloppy paired (slp) coding for the corresponding proteins 1 para-segment of 4 cells 60 nodes: variables are expression levels of segment polarity genes/proteins Boolean model from (Albert and Othmer, Journal of Theoretical Biology, 2003) DasGupta, Vera-Licona, Sontag, 2011 ISBRA 2012 b. In Silico network: gene regulatory network with external perturbations – – – 13 species: 10 genes plus 3 different environmental perturbations perturbations affect the transcription rate of the gene on which they act directly (through inhibition or activation) and their effect is propagated throughout the network by the interactions between the genes generated using the software package in (Mendes, Trends Biochem. Sci, 1997) DasGupta, Vera-Licona, Sontag, 2011 ISBRA 2012 generated time courses for both networks (a) and (b) For method (A) we considered both greedy and linear programming based approximations to the hitting set problem as well as redundancy values R=1, 2 For method (B), input data must be discrete used three discretization methods: • graph-theoretic based approached “D” from (Dimitrova, Garcia-Puente, Jarrah, Laubenbacher, Stigler, Stillman, Vera-Licona, 2010) • quantile “Q” discretization (method in which each variable state receives an equal number of data values) • interval “I” discretization (select thresholds for the different discrete values). DasGupta, Vera-Licona, Sontag, 2011 ISBRA 2012 Summary of Comparison network (b): • method (B) was better than method (A) in ROC space • method (A) achieved a performance no better than random guessing network (a): • method (B) could not obtain any results after running over 12 hours • method (A) was able to compute results in less than 1 minute • method (A) improved slightly when small redundancy was introduced DasGupta, Vera-Licona, Sontag, 2011 ISBRA 2012 implementation of method (B): http://polymath.vbi.vt.edu/polynome/ implementation of method (A) done by (DasGupta, Vera-Licona, Sontag, 2011) at http://sts.bioengr.uic.edu/causal/ DasGupta, Vera-Licona, Sontag, 2011 ISBRA 2012 Direct Synthesization of Signal Transduction Networks Only from known interactions and information No new experiments needed Overall Goal direct interaction A→B A ┤B additional information network Method (algorithms, software) minimal complexity biologically relevant double-causal interaction A → (B → C) A → (B ┤C) FAST ISBRA 2012 Nature of experimental evidence • biochemical – direct interaction, e.g., • binding of two proteins • a transcription factor activating the transcription of a gene • a chemical reaction with a single reactant and single product • pharmacological – indirect causal effects most probably resulting from a chain of interactions and reactions, e.g., • • binding of a chemical to a receptor protein starts a cascade of proteinprotein interactions and chemical reactions that ultimately results in the transcription of a gene genetic evidence of differential responses to a stimulus – can be direct, but most often indirect (double-causal) ISBRA 2012 We describe a method for synthesizing double-causal (path-level) information into a consistent network ISBRA 2012 Direct interactions A promotes B A→B A inhibits B A┤ B A B A B Illustration of double-causal interaction C promotes the process of A promoting B pseudo A B C ISBRA 2012 “Critical” edge (known direct interaction, part of input) ISBRA 2012 Main computational step for network synthesis • Pseudo-vertex collapse (PVC) – easy • Binary transitive reduction (BTR) – hard – need heuristics ISBRA 2012 Pseudo-vertex collapse (PVC) Intuitively, PVC is useful for reducing the pseudo-vertex set to the minimal set that maintains the graph consistent with all indirect experimental observations. pseudo-vertices u out(u)=out(v) in(u)=in(v) v new psuedo-vertex uv ISBRA 2012 Illustration of Binary Transitive Reduction (BTR) remove? remove? no, critical edge yes, alternate path Intuitively, the BTR problem is useful for determining the sparsest graph consistent with a set of experimental observations ISBRA 2012 Some biologists did look at very simplified or somewhat different version of BTR, e.g.: • A. Wagner, Estimating Coarse Gene Network Structure from Large-Scale Gene Perturbation Data, Genome Research, 12, pp. 309-315, 2002 – too special (reachability only), no efficient algorithms reported • T. Chen, V. Filkov and S. Skiena, Identifying Gene Regulatory Networks from Experimental Data, Third Annual International Conference on Computational Moledular Biology, pp. 94-103, 1999 – “excess edge deletion” problem, biologically too restrictive version See the following excellent survey for more comprehensive information about biological network inference and modeling: • V. Filkov, Identifying Gene Regulatory Networks from Gene Expression Data, in Handbook of • Computational Molecular Biology (edited by S. Aluru), Chapman & Hall/CRC Press, 2005 H. D. Jong, Modelling and Simulation of Genetic Regulatory Systems: A Literature Review, Journal of Computational Biology, Volume 9, Number 1, pp. 67-103, 2002 ISBRA 2012 High level description of the network synthesis process Synthesize direct interactions Optimize Interaction with biologists BTR Synthesize double-causal interactions Optimize PVC BTR Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007 ISBRA 2012 All the steps in the network synthesis procedure except the steps that involve BTR can be done easily Thus, it behooves to look at BTR more closely ISBRA 2012 But, before that, biological validation of the network synthesis approach is desirable Need a network that uses double-causal experimental evidence ISBRA 2012 Plant signal transduction network consistent guard cell signal transduction network for ABAinduced stomatal closure – manually curated – described in S. Li, S. M. Assmann and R. Albert, Predicting Essential Components of Signal Transduction Networks: A Dynamic Model of Guard Cell Abscisic Acid Signaling, PLoS Biology, 4(10), October 2006 – list of experimentally observed causal relationships collected by Li et al. and published as Table S1. This table contains • around 140 interactions and causal inferences, both of type “A promotes B” and “C promotes process (A promotes B)” – We augment this list with critical edges drawn from biophysical/biochemical knowledge on enzymatic reactions and ion flows and with simplifying hypotheses made by Li et al. both described in Text of S1 ISBRA 2012 We also formalized an additional rule specific to the context of this network (and implicitly assumed by Li et al.) regarding enzyme-catalyzed reactions ISBRA 2012 Regulatory interactions between ABA signal transduction pathway components ISBRA 2012 Regulatory interactions between ABA signal transduction pathway components (continued) ERA1 ┤(ABA → CalM) NO → GC not critical and not enzymatic ISBRA 2012 Some nodes in the network GCR1 OST1 NO ABH1 RAC1 putative G protein coupled receptor protein Nitric Oxide RNA cap-binding protein small GTPase protein … ISBRA 2012 (left) Guard cell signal transduction network for ABA-induced stomatal closure manually curated by Li, Assmann and Albert [source: PloS Biology, 10 (4), 2006]. ( right) our developed automated network synthesis procedure produced a reduced (fewer edges) network while preserving all observed pathways Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007 ISBRA 2012 Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007 ISBRA 2012 Summary of comparison of the two networks • Li et al. has 54 vertices and 92 edges our network has 57 vertices but 84 edges • Both networks have identical strongly connected component of vertices • All the paths present in the Li et al.’s reconstruction are present in our network as well • The two networks have 71 common edges • It took a few seconds to synthesize our network Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007 ISBRA 2012 Summary of comparison of the two networks (continued) Thus the two networks are highly similar but diverge on a few edges, All these discrepancies are not due to algorithmic deficiencies but to human decisions. Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007 ISBRA 2012 Software is available at: http://www.cs.uic.edu/~dasgupta/network-synthesis/ • runs on any machine with MS Windows (Win32) – click, save the executable and run ISBRA 2012 Data sources for this type of network synthesis Signal transduction pathway repositories such as • TRANSPATH (http://www.generegulation.com/pub/databases.html#transpath) • protein interaction databases such as the Search Tool for the Retrieval of Interacting Proteins (http://string.embl.de) contain up to thousands of interactions, a large number of which are not supported by direct physical evidence. NET-SYNTHESIS can be used to filter redundant information while keeping all direct interactions ISBRA 2012 Transitive reduction step used a heuristic How good is the heuristic in general? ISBRA 2012 Performance of our BTR algorithm on “random” signal transduction networks But, what is a random biological network? ISBRA 2012 Biological networks are scale-free: e.g., N. Guelzim, S. Bottani, P. Bourgine, and F. Kepes, Topological and causal structure of the yeast transcriptional regulatory network, Nature Genetics 31, 60–63, 2002 Biological networks are NOT scale-free: e.g., : R. Khanin and E. Wit, How Scale-Free Are Biological Networks ?, Journal of Computational Biology, 13 (3), 810 -818, 2006 So, we decided to look at the literature ourselves and decide on a reasonable model for random signal transduction networks ISBRA 2012 According to us, random signal transduction networks: • distribution of in-degree of the network is exponential: Pr[in-degree=x]=L e-Lx, ½ ≤ L ≤ ⅓ maximum in-degree is 12 • distribution of out-degree is governed by a power-law: x ≥ 1 : Pr[out-degree=x]=cx-c; Pr[out-degree=0] ≥ c, 2 < c < 3 maximum out-degree is 200 • ratio of excitory to inhibitory edges between 2 and 4 random graphs with prescribed degree distributions are generated using the procedure described in: M. E. J. Newman, S. H. Strogatz and D. J. Watts. Random graphs with arbitrary degree distributions and their applications, Physical Review E, 64 (2), 026118-026134, 2001 ISBRA 2012 What percentage of edges should be Critical (known direct interaction)? No known accurate estimates: • curated network of Ma'ayan et al. (Science, 2005) – expected to have close to 100% critical edges as they specifically focused on collecting direct interactions only • Protein interaction networks are expected to be mostly critical – Giot et al., Science, 2003 – Han et al., Nature, 2004 – Li et al., Science, 2004 • Genetic interactions (e.g., synthetic lethal interactions) – represent compensatory relationships – only a minority are direct interactions. • Reverse engineering approaches: – lead to networks whose interactions are close to 0% critical ISBRA 2012 We tried a few small and large values, such as 1%, 2% and 50%, for the percentage of edges that are critical to catch qualitatively all regions of dynamics of the network that are of interest ISBRA 2012 Tested on about 550 random networks – # of vertices in the range of about 100 to 1000 – running time for individual networks • seconds to at most a minute ISBRA 2012 Verify the robustness of performance of our BTR algorithm – perturb network such they do not change the optimal solution of the original graph Almost always the solution quality does not change because of this ISBRA 2012 frequency of occurence Performance of our implemented algorithm for BTR on random networks 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 % additional edges = ( ( |E'| / OPT ) - 1 ) * 100 On an average, we use about 5.5% more edges than the optimum Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007 ISBRA 2012 Other applications NET-SYNTHESIS Synthesizing a Network for T Cell Survival and Death in LGL Leukemia Backgound • Large Granular Lymphocytes (LGL) – medium to large size cells with eccentric nuclei and abundant cytoplasm – comprise 10%~15% of the total peripheral blood mononuclear cells – two major lineages • CD3- natural-killer (NK) cell lineage: ~85% of LGL cells • CD3+ lineage: ~15% of LGL Kachalo, Zhang, Sontag, Albert, DasGupta, 2008 ISBRA 2012 LGL leukemia disordered clonal expansion of LGL and their invasions in the marrow, spleen and liver ISBRA 2012 Background (continued) Ras: – small GTPase essential for controlling multiple essential signaling pathways – its deregulation is frequently seen in human cancers Activation of H-Ras require its farnesylation, which can be blocked by Farnesyltransferase inhibitiors (FTIs) This envisions FTIs as future drug target for anti-cancer therapies, and several FTIs have entered early phase clinical trials This observation, together with the finding that Ras is constitutively activated in leukemic LGL cells, leads to the hypothesis that Ras plays an important role in LGL leukemia, and may functions through influencing Fas/FasL pathway. ISBRA 2012 we constructed the cell-survival/cell-death regulation-related signaling network, with special interest on the Ras’ effect on apoptosis response through Fas/FasL pathway Goal: initiate understanding of the interactions between Ras pathway and Fas/FasL pathways, two of the major pathways that regulate cell survival/death decision. Currently, there is no standard therapy for LGL leukemia. Understanding the mechanism of this disease is crucial for drug/therapy development Proteins that modulate the Ras-apoptosis response can potentially serve as future reference for drug design and therapeutic-target-molecule search, and this may not be restricted to LGL leukemia Kachalo, Zhang, Sontag, Albert, DasGupta, 2008 ISBRA 2012 Synthesizing a Network for T Cell Survival and Death in Large Granular Lymphocyte Leukemia • Synthesized a cell-survival/cell-death regulation-related signaling network from the TRANSPATH 6.0 database, with additional information manually curated from literature search • 359 vertices of this network represent proteins/protein families and mRNAs participating in pro-survival and Fas-induced apoptosis pathways • 1295 edges represent regulatory relationships between nodes, including protein interactions, catalytic reactions, transcriptional regulation (no double-causal interactions were known) • Performing BTR with NET-SYNTHESIS reduced the total edge-number to 873 Kachalo, Zhang, Sontag, Albert, DasGupta, 2008 ISBRA 2012 To focus on pathways that involve the 33 known T-LGL deregulated proteins, we designated vertices that correspond to proteins with no evidence of being changed during T-LGL as pseudo-vertices and deleted the label “Y” for those edges whose both endpoints were pseudovertices Recursively performing “Reduction (faster)” BTR and “Collapse degree-2 pseudonodes” of NET-SYNTHESIS until no edge/node could be further removed simplified the network to 267 nodes and 751 edges. Kachalo, Zhang, Sontag, Albert, DasGupta, 2008 ISBRA 2012 For further results, see R. Zhang, M. V. Shah, J. Yang, S. B. Nyland, X. Liu, J. K. Yun, R. Albert, and T. P. Loughran, Network Model of Survival Signaling in LGL Leukemia PNAS, 2008 ISBRA 2012 Binary transitive reductions revives two further interesting questions: – how redundant are biological networks ? • what is redundancy and how to measure it ? – percentage of edges removed by binary transitive reduction (Albert, DasGupta, Gitter, Gürsoy, Hegde, Pal, Sivanathan, Sontag, 2011) – are redundancy and dynamical properties correlated ? ISBRA 2012 Feedback loops and dynamics of biological networks analyzing behaviors of feedback loops is a long-standing topic in the context of regulation, metabolism, and developments – e.g., see classical reference works such as J. Monod and F. Jacob, General conclusions: telenomic mechanisms in cellular metabolism, growth, and differentiation, Cold Spring Harbor Symp. Quant. Biol., 26, 389401, 1961 ISBRA 2012 Monotone dynamical system ISBRA 2012 Monotone dynamical system ISBRA 2012 Monotone systems are “simpler behaved” systems: • pathological behavior (“chaos”) is ruled out • even though they may have arbitrarily large dimensionality, monotone systems behave in many ways like one-dimensional systems – e. g. , in monotone systems • bounded trajectories generically converge to steady states • there are no stable oscillatory behaviors ISBRA 2012 Associated Signal Transduction Network f k ( x) 0 xi v1 vi vj vk vn f k ( x) 0 x j ISBRA 2012 + + + - + - - + + - + sign-inconsistent sign-consistent parity: product of signs sign-consistent: every undirected path between two nodes have same parity ( check undirected paths 1 — 4 and 1 — 2 — 3 — 4 ) ISBRA 2012 sign-consistent networks are monotone system This allows us to define the “degree of monotonicity” M of a differential equation system in the following way: minimum percentage of edges we need to delete to make the associated signal transduction network sign-consistent (Albert, DasGupta, Gitter, Gürsoy, Hegde, Pal, Sivanathan, Sontag, 2011) ISBRA 2012 ISBRA 2012 Undirected Labeling Problem (ULP) needed to compute degree of monotonicity M Given: undirected graph G=(V,E) edge labeling function h: E {0,1} Valid solution: a vertex labeling function f: V {0,1} Definition: an edge {u,v}E is consistent if h(u,v) = f(u) + f(v) (mod 2) Goal: maximize number of consistent edges Bad news: NP-hard and even MAX-SNP-hard. DasGupta, Enciso, Sontag, Zhang, 2007 ISBRA 2012 Algorithm for ULP • Solve the following vector program via Semidefinite programming methods: maximize subject to: for each vV, xv · xv = 1 for each vV, xv|V| • Select an uniformly random vector r in the |V|-dimensional unit sphere • Label each vertex v as 0 if r · xv 0 1 otherwise It can be easily implemented in MATLAB DasGupta, Enciso, Sontag, Zhang, 2007 ISBRA 2012 We have two measurable properties: – (topological) redundancy R • percentage of edges removed by binary transitive reduction – (dynamical) monotonicity M • minimum percentage of edges we need to delete to make the associated signal transduction network consistent M is negatively correlated to R (Albert, DasGupta, Gitter, Gürsoy, Hegde, Pal, Sivanathan, Sontag, 2011) ISBRA 2012 Some other conclusions from (Albert, DasGupta, Gitter, Gürsoy, Hegde, Pal, Sivanathan, Sontag, 2011) • the redundancy measure R is statistically significant • transcriptional networks are less redundant than signaling networks • redundancy of C. elegans metabolic network is largely due to currency metabolites • calculation of redundancy values and minimal networks provides a way to gain insight into predicted orientation of a protein-protein-interaction (PPI) networks ISBRA 2012 Future Research Questions in the context of parallel and distributed computing • Synchronization: – no “global clocks” are known to exist for cellular processes (ignoring circadian rhythms and some other global timing mechanisms in higher organisms) • Spatial effects: – localization (nuclear, cytoplasmic, membrane-bound) in cells • akin to geographical location affecting communication speeds and coordination in distributed computing ISBRA 2012 List of some relevant references R. Albert, B. DasGupta, et al. A New Computationally Efficient Measure of Topological Redundancy of Biological and Social Networks, Physical Review E, 84 (3), 036117, 2011. B. DasGupta, P. Vera-Licona, E. Sontag. Reverse Engineering of Molecular Networks from a Common Combinatorial Approach, in Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications, John Wiley & Sons, Inc., 2011. R. Albert, B. DasGupta, E. Sontag. Inference of signal transduction networks from double causal evidence, in Methods in Molecular Biology: Topics in Computational Biology, D. Fenyo (editor), Springer , 2010. P. Berman, B. DasGupta, M. Karpinski. Approximating Transitive Reduction Problems for Directed Networks, 11th Algorithms and Data Structures Symposium, 2009. R. Albert, B. DasGupta, R. Dondi, E. Sontag. Inferring (Biological) Signal Transduction Networks via Transitive Reductions of Directed Graphs, Algorithmica, 51 (2), 129-159, 2008. S. Kachalo, R. Zhang, E. Sontag, R. Albert, B. DasGupta. NET-SYNTHESIS: A software for synthesis, inference and simplification of signal transduction networks, Bioinformatics, 24 (2), 293-295, 2008. P. Berman, B. DasGupta, E. Sontag. Algorithmic Issues in Reverse Engineering of Protein and Gene Networks via the Modular Response Analysis Method, Annals of the New York Academy of Sciences, 2007. R. Albert, B. DasGupta, et al. A Novel Method for Signal Transduction Network Inference from Indirect Experimental Evidence, Journal of Computational Biology, 14 (7), 927-949, 2007. B. DasGupta, G. A. Enciso, E. Sontag, Y. Zhang. Algorithmic and Complexity Results for Decompositions of Biological Networks into Monotone Subsystems}, Biosystems, 90 (1), 161-178, 2007. P. Berman, B. DasGupta, E. Sontag. Computational Complexities of Combinatorial Problems With Applications to Reverse Engineering of Biological Networks, in Advances in Computational Intelligence: Theory and Applications, F.-Y. Wang and D. Liu (editors), Series in Intelligent Control and Intelligent Automation, World Scientific publishers, 303-316, 2007. P. Berman, B. DasGupta, E. Sontag. Randomized Approximation Algorithms for Set Multicover Problems with Applications to Reverse Engineering of Protein and Gene Networks, Discrete Applied Mathematics, 155 (6-7), 733-749, 2007. ISBRA 2012 Acknowledgments Thanks to research collaborators for these projects R. Albert (Penn State) G. Enciso (UC Irvine) R. Hegde (UIC) P. Pal P. Vera-Licona (INRIA) R. Zhang (Penn State) P. Berman (Penn State) A. Gitter (CMU) S. Kachalo (UIC) G. S. Sivanathan (UIC) K. Westbrooks (GSU) Y. Zhang (UIC) R. Dondi (U. of Bergamo) G. Gürsoy (UIC) M. Karpinski (Bonn) E. Sontag (Rutgers) A. Zelikovsky (GSU) Thanks to National Science Foundation (NSF) for funding: DBI-1062328 IIS-0610244 IIS-1064681 CCR-9800086 IIS-0346973 CNS-0206795 DBI-0543365 CCF-0208749 Thanks to generous support from DIMACS (Rutgers) during my Sabbatical leave through their special focus on computational and mathematical epidemiology ISBRA 2012 Thank you for your attention! Questions? ISBRA 2012 98