Integrating my interests in stochastic chemical kinetics and metabolic flux balance analysis. Andrzej M. Kierzek Reader in Systems Biology, Faculty of Health and Medical Sciences, University of Surrey, Guildford, GU2 7XH, UK Thursday, 7 April 2011 Stochastic kinetic models and FBA - extremes of biochemical network modelling spectrum. • Stochastic chemical kinetics: Most detailed dynamic models of best studied systems where kinetic parameters are known. The time evolution of the distribution of the number of molecules in individual cells is studied. (Mol. BioSyst 2010, Biophys J 2009, Biophys J 2004, Bioinformatics 2002, J. Biol. Chem 2001) • Flux Balance Analysis: Least detailed steady state models are studied. The reaction fluxes are calculated and no information is obtained about molecular amounts/activities. Genome scale networks are reconstructed and analysed (PLoS Comp. Biol. 2011, Bioinformatics 2011, Metabolic Engineering 2008, Genome Biology 2007). Thursday, 7 April 2011 I. Stochastic computer simulations of the biochemical reaction network dynamics. Thursday, 7 April 2011 Acknowledgements The work on stochastic dynamics of two component system signalling has been done in collaboration with the group of Barry Wanner, Purdue University, USA …. … which was established during informal “Warsaw Pact” consortium meeting in Warsaw and Zelazowa Wola 2004 Thursday, 7 April 2011 Flow cytometry of a two component system signalling system. Signal P P Regulator GFP Reporter Thursday, 7 April 2011 Kinase Phosphotransfer reactions involving formation of regulatorkinase complexes Zhou et al (Barry L. Wanner’s group, Purdue) Yearbook Bioinformatics 2004, R. Hofestädt (ed.), pp. 25-38. Magdeburg: IMBio, Informationsmanagement in der Biotechnologie e.V., 2005. Thursday, 7 April 2011 Stochastic kinetic model of TCS signalling. Two component system signalling in Escherichia coli. Kierzek, Zhou, Wanner, Molecular Biosystems 2010. Thursday, 7 April 2011 Stochastic kinetic model of TCS signalling. Propensity: c9 #mRNAKin c9 = 10-4 1/s Two component system signalling in Escherichia coli. Kierzek, Zhou, Wanner, Molecular Biosystems 2010. Thursday, 7 April 2011 Stochastic kinetic model of TCS signalling. Thursday, 7 April 2011 Exact stochastic simulation with Gillespie algorithm. 1. Set initial state of the system x0=(X1(t0),...,XN(t0)) 2. For every reaction Rµ compute aµ(x) 3. Compute the sum of propensity func. a0 = ∑j=1,M aj(x) 4. Randomly select reaction Rµ and waiting time τ such that: P(τ,µ)dτ = aµ exp(-a0τ) dτ P(µ) = (aµ / a0) P(τ) = a0 exp(-a0τ) dτ 5. Update the system: x(t+τ) = x(t) + νµ 6. Set simulation time to t + τ 7. Go to 2. Thursday, 7 April 2011 S+E->ES c = 1 1/s ES->E+S c = 10 1/s ES->P+E c = 1 1/s Initial conditions: #S = 100; #E = 20 Exact stochastic simulation with Gillespie algorithm. 1. Set initial state of the system x0=(X1(t0),...,XN(t0)) 2. For every reaction Rµ compute aµ(x) 3. Compute the sum of propensity func. a0 = ∑j=1,M aj(x) 4. Randomly select reaction Rµ and waiting time τ such that: P(τ,µ)dτ = aµ exp(-a0τ) dτ P(µ) = (aµ / a0) P(τ) = a0 exp(-a0τ) dτ 5. Update the system: x(t+τ) = x(t) + νµ 6. Set simulation time to t + τ 7. Go to 2. Thursday, 7 April 2011 S+E->ES c = 1 1/s ES->E+S c = 10 1/s ES->P+E c = 1 1/s Initial conditions: #S = 100; #E = 20 Stochastic kinetic model of TCS signalling. Thursday, 7 April 2011 Validation of the model. Thursday, 7 April 2011 The effect of transcription and translation initiation frequencies on the stochastic fluctuations in prokaryotic gene expression. If two genes produce the same mean number of protein molecules the one with higher transcription and lower translation exhibits smaller variance of the number of protein molecules. Rigney, D. R. & Schieve, W. C. J. Theor. Biol. 69, 761–766 (1977). Berg,O. G. J. Theor. Biol. 173, 307–320 (1978). Kierzek A.M, et al. J. Biol. Chem. 276, 8165-8172 (2001) M. Thattai and A. van Oudenaarden, Proc. Natl. Acad. Sci. U. S. A., 2001, 98, 8614–8619. E. M. Ozbudak, et al., Nat. Genet., 2002, 31, 69–73.. Thursday, 7 April 2011 Stochastic fluctuations in expression of TCS genes. Slow transcription, fast translation Thursday, 7 April 2011 Fast transcription, slow translation Dependence of stochastic switching behaviour of TCS on the HK gene expression noise and RR autoregulation. Thursday, 7 April 2011 Autoregulated RR Low HK noise High HK noise Thursday, 7 April 2011 Constitutive RR Mixed mode response. Thursday, 7 April 2011 II. Flux Balance Analysis of genome scale metabolic networks. Thursday, 7 April 2011 Should I wait until there is enough quantitative enough parameters to run dynamic simulations and resist temptation of working in projects like TB-HOST-NET, or should I use approximate methods to study gene function now ? TB-HOST-NET: Integration of computational modelling with transcription and gene essentiality profiling of both MTB bacillus and infected human dendritic cells and macrophages to understand molecular interaction networks involved in host-pathogen cross-talk. Stewart GR, Robertson BD, Young Nat Rev Microbiol. 2003 EraSysBio+ TB-HOST-NET Partners Andrzej M. Kierzek (ccordinator, Surrey) Graham Stewart (University of Surrey) Johnjoe McFadded (University of Surrey) Steffen Klamt (Max Planck Institute) Olivier Neyrolles (CNRS) Ludovic Tailleux (Institute Pasteur) Maria Foti (University of Milan-Bicocca) Thursday, 7 April 2011 Flux Balance Analysis – a constraint based approach Bxt Cxt Find maximal dX/dt if the following constraints are saRsfied: Value to be maximised (objec5ve func5on) Axt growth Transport of extracellular (external, unbalanced) metabolites. Dxt Adapted from FluxAnalyzer so:ware (Steffen Klamt,MPI Magdeburg) The linear programming algorithm finds the largest possible value of dX/dt. However, there are many possible values of fluxes (F1,..,F8) that result in the same maximal value of objecRve funcRon. Thursday, 7 April 2011 Minimal and maximal reacRon capaciRes (bounds). R4 is the only reversible reacRon in the system. Steady state (flux balance) assumpRon for intracellular (internal) metabolites. GSMN‐TB: a web‐based genome‐scale network model of Mycobacterium tuberculosis metabolism. Dany JV Beste*, Tracy Hooper*, Graham Stewart, Bhushan Bonde, Claudio Avignone‐ Rossa, Michael E Bushell, Paul Wheeler, Steffen Klamt, Andrzej M Kierzek#, Johnjoe McFadden# * Joint first authors # Joint senior authors Genome Biology 2007, 8(5):R89 Web so:ware available at: hdp://sysbio3.fms.surrey.ac.uk/ Thursday, 7 April 2011 GSMN-TB model and SurreyFBA Reaction Class Thursday, 7 April 2011 Number Enzymatic conversions 723 Transport reactions 126 Total number of reactions 849 Orphan reactions 210 Genes 726 Internal metabolites 638 External metabolites 101 Total number of metabolites 739 Screening for essential genes by Transposon Site Hybridisation (TraSH) Thursday, 7 April 2011 Comparison of gene essentiality prediction with TraSH data. For each gene in the model { Constrain all fluxes that require this gene to 0. Run FBA. If the flux towards BIOMASS is less than cut-off { the gene is essential. } else { the gene is non-essential } } Compare list of predicted essential genes with the list of genes with TraSH microarray signal ratio (insertion probe/genomic probe) lower than cut-off. Classify genes to the following categories. TP FP TN FN essential in the essential in the non-essential in non-essential in model and essential in experiment. model, non-essential in experiment. the model, non essential in experiment the model, essential in experiment Predictions of GSMN-TB model were compared with data set of Sassetti et al, Molecular Microbiology, 2003 Thursday, 7 April 2011 Receiver Operating Characteristics (ROC) of gene essentiality prediction. Each ROC curve shows 100 points corresponding to sensitivity and specificity of the model predictions obtained for growth rate thresholds varying in the range from 0.0 to 0.1 (increment 0.001). The growth rate threshold has no effect on prediction accuracy. The LP optimisation is effectively used as a qualitative test of BIOMASS producibility and it is irrelevant whether TB bacillus grows with maximal rate or not. Different curves correspond to TraSH ratio thresholds of 0.05, 0.1, 0.2, 0.6, 1. The TraSH ratio cutoff has considerable influence on prediction accurracy. Sensitivity = TP/(TP + FN) Specificity = TN/(TN+FP) Thursday, 7 April 2011 The best ROC curve corresponds to the following prediction scores: Sensitivity 71%, Specificity 80%, Correct predictions 78%. Microarray signal distributions for essential and non-essential genes. Distributions of the raw TraSH signal ratios for genes present in the model. Blue line shows distribution for genes that were predicted by the model to be essential for growth. Red line shows distribution of TraSH ratio among genes predicted to be nonessential for BIOMASS production. Medians of two distributions significantly different by means of non-parametric U-test (p-value < 2e-16) Thursday, 7 April 2011 Extension to expression data: Perform this analysis for every metabolite in the network, not only for BIOMASS. For each metabolite in the network identify genes that influence and do not influence its producibility. Compare microarray signal for both groups of genes and decide whether metabolite is differentially affected by gene expression program. Changes in metabolite producibility in Streptomyces coelicolor during the onset of antibiotic production. All three antibiotics included into the GSMN model are on the list of differentially affected metabolites !!!!! Collaboration with the group of Colin Smith; BBSRC grant BBD0115821 Thursday, 7 April 2011 Differential Producibility Analysis (DPA) ,…, Producibility plot: For each metabolite find all genes which inactivation affects maximal flux towards metabolite. Use Rank Product Analysis to calculate metabolite p-values. Metabolites that rank consistently high have “significant” p-values and are declared differentially producible. For each metabolite calculate metabolite signal as a median log microarray signal ratio of genes influencing producibility of the given metabolite. For each metabolite compile profile: pj = (r1,..,rn) where ri is the rank of metabolite signal j in the ith dataset. Bhushan Bonde, Dany Beste, Emma Laing, Andrzej M. Kierzek , Johnjoe McFadden, PLoS Computational Biology, in press Thursday, 7 April 2011 Maximisation of congruence between transcriptome data and FBA model of global MTB metabolism. Modified approach of Shlomi et al, Nature Biotechnology, 26, 1003, 2008 Thursday, 7 April 2011 Prediction and community annotation of gene function. http://sysbio3.fhms.surrey.ac.uk Thursday, 7 April 2011 II. Integration: Using stochastic simulation to sample sequences of feasible events linking inputs and outcomes. Thursday, 7 April 2011 What about more general than metabolic models where analysis of quasi steady-state flux distribution is not useful? Raza et al. BMC Systems Biology 2010, 4:63 http://www.biomedcentral.com/1752-0509/4/63 Thursday, 7 April 2011 Requirements of the approach 1. Makes useful qualitative predictions about effects of gene inactivation even if nothing but network connectivity is known. 2. Allows accommodation of rates and rate equations, if these data are available for some of the reactions. 3. Is equivalent to stochastic dynamics, if all rates are known. 4. Analyses genome scale networks involving all classes of interactions Thursday, 7 April 2011 Continuously changing approximation level Formal Methods in Molecular Biology Dagstuhl Seminar 09091 Breitling, Gilbert, Heiner, Priami, February 2009 Thursday, 7 April 2011 Petri net token game based approach 1.Represent the system as Petri net. 2.Create two copies of the model: a WT model with all genes active and mutant model with inactive gene of interest. 3.Run large number of token games for both WT and mutant model and calculate the number of times these trajectories exhibit property of interest. 4.The mutant/WT ratio of the numbers of behaviour occurrences measures involvement of the gene in particular behaviour (gene function). This approach is similar to Signalling Petri Net method of Ruths et al. (PLoS Computational Biology, 2008). I used different stochastic dynamics to sample the network behaviours and I used Continuous Stochastic Logic to express properties I was testing for. Thursday, 7 April 2011 Benchmark System Yeast cell cycle model of Chen et al. Molecular Biology of the Cell 2004 in original and SBGN notations. Thursday, 7 April 2011 Stochastic Petri Net has been implemented in Prism. Thursday, 7 April 2011 Stochastic Petri Net has been implemented in Prism. Thursday, 7 April 2011 Properties of Interest // G2 arrest P=? [ G[200,300] nSPN<4 & nORI>4 & nBUD>4 & ndiv=0 ] // G1 arrest P=? [ G[200,300] nORI<4 & nBUD<4 & ndiv=0 ] // M-phase arrest P=? [ G[200,300] nSPN>4 & nORI>4 & nBUD>4 & ndiv=0 ] // Division happened // twice P=? [ F<=200 ndiv=2 ] Statistical model checking with CTMC dynamics has been used to calculate probabilities in WT and mutant models. In the initial state 10 tokens has been placed at the nodes. Reaction rates were set to 1. Thursday, 7 April 2011 Correct results for three mutants Gene inactivation Divided twice G1 arrest G2 arrest M-phase arrest Wild type 0.5685 0.0 0.006 0.0181 YGR108W 0.0 and YPR119W (Clb2) 0.0 1.0 0.0 YAL040c (Cln3) 0.0 1.0 0.0 0.0 YGL116W (Cdc20) 0.026 0.0 0.0 0.7762 Thursday, 7 April 2011 Acknowledgements Computational systems biology group @ Surrey. Andrea Rocco Albert Gevorgyan Ahmad Mannan Huihai Wu Collaborating academics @ Surrey. Claudio Avignone-Rossa Michael Bushell Rebecca Hoyle Andrzej M. Kierzek Emma Laing Roberto La Ragione Johnjoe McFadden Colin P. Smith Graham Stewart International Collaborations. Barry Wanner (Purdue) Hirotada Mori (Nara Institute of Science and Technology) Kazuyuki Shimizu (Kyushu Institute of Technology) EraSysBio+ TB-HOST-NET Partners Steffen Klamt (Max Planck Institute) Olivier Neyrolles (CNRS) Ludovic Tailleux (Institute Pasteur) Maria Foti (University of Milan-Bicocca) Thursday, 7 April 2011 The work presented here and its continuation is generously funded by three BBSRC grants, BBSRC/EraSysbioPlus grant, Wellcome trust grant and MRC PhD fellowship.