Pathway Modeling in Metabolic Networks Stefan Schuster Friedrich Schiller University Jena Dept. of Bioinformatics JENA Battle of Jena and Auerstedt October 14, 1806 Famous people at Jena University: Friedrich Schiller (1759-1805) Matthias Schleiden (1804-1881) Discoverer of the living plant cell Ernst Haeckel (1834-1919, Biogenetic rule) Introduction • Metabolism is bridge between genotype and phenotype • Technological relevance of metabolism: Synthesis of specific products (antibiotics, amino acids, ethanol, dyes, odorants) • Production of edibles: cheese, bread, wine, etc. • Degradation of xenobiotics Metabolic networks are complex due to their size and the presence of bimolecular reactions. Hypergraphs. Source: Introduction (2) • Structure and (nonlinear) dynamics of metabolic networks cannot be understood intuitively • Theoretical methods needed • These methods should be systemic (Systems Biology) rather than too reductionist Metabolic Pathway Analysis (or Metabolic Network Analysis) • Decomposition of the network into the smallest functional entities (metabolic pathways) • Does not require knowledge of kinetic parameters!! • Uses stoichiometric coefficients and reversibility/irreversibility of reactions non-elementary flux mode elementary flux modes S. Schuster and C. Hilgetag: J. Biol. Syst. 2 (1994) 165-182; StS, T. Dandekar, D.A. Fell: Trends Biotechnol. 17 (1999) 53-60; StS, D.A. Fell, T. Dandekar: Nature Biotechnol. 18 (2000) 326-332 An elementary mode is a minimal set of enzymes that can operate at steady state with all irreversible reactions used in the appropriate direction All flux distributions in the living cell are non-negative linear combinations of elementary modes Related concept: Extreme pathway (Schilling, Letscher and Palsson, J. theor. Biol. 203 (2000) 229) - distinction between internal and exchange reactions, all internal reversible reactions are split up into forward and reverse steps Mathematical background Steady-state condition NV(S) = 0 Sign restriction for irreversible fluxes: Virr 0 If the kinetic parameters are unknown, one can try to solve this for V. The equation/inequality system is linear and homogeneous in V. However, usually there is a manifold of solutions, which then represents a convex region. All edges correspond to elementary modes. In addition, there may be elementary modes in the interior. Geometrical interpretation Elementary modes correspond to generating vectors (edges) of a convex polyhedral cone (= pyramid) in flux space (if all reactions are irreversible) Rate 3 Rate 2 generating vectors Rate of enzyme 1 ATP X5P CO2 Ru5P NADPH NADP S7P Pyr E4P ADP R5P GAP PEP F6P 6PG 2PG GO6P 3PG ATP NADPH NADP G6P ADP F6P FP 2 GAP DHAP ATP NAD 1.3BPG NADH ADP Part of monosaccharide metabolism Red: external metabolites ATP Pyr ADP PEP 2PG ATP 3PG ADP G6P F6P FP GAP 2 DHAP ATP ADP 1st elementary mode: glycolysis NAD 1.3BPG NADH F6P ATP FP2 ADP 2nd elementary mode: fructose-bisphosphate cycle ATP X5P CO2 Ru5P NADPH NADP S7P E4P ADP GAP R5P PEP F6P 6PG 2PG GO6P 3PG ATP NADPH NADP Pyr ADP G6P F6P FP 2 GAP DHAP ATP NAD 1.3BPG NADH ADP 4 out of 7 elementary modes S. Schuster, D.A. Fell, T. Dandekar: Nature Biotechnol. 18 (2000) 326-332 Algorithm for computing elementary modes Related to Gauss-Jordan method Starts with tableau (NT I) Pairwise combination of rows so that one column of NT after the other becomes null vector Test before each combination whether resulting row is elementary S. Schuster et al.: Nature Biotechnol. 18 (2000) 326-332 J. Math. Biol. 45 (2002) 153-181. Example: S2 3 4 P1 1 1 0 −1 0 ( 0) T = −1 1 1 −1 S1 2 P2 M 1 0 0 0 M 0 1 0 0 M 0 0 1 0 M 0 0 0 1 1 0 −1 0 ( 0) T = −1 1 1 −1 0 0 0 1 ( 1) T = 0 −1 0 0 M 1 0 0 0 M 0 1 0 0 M 0 0 1 0 M 0 0 0 1 M 1 1 0 0 M 1 0 1 0 M 0 1 0 1 M 0 0 1 1 These two rows should not be combined Final tableau: T(2 ) 0 0 M 1 1 0 0 = 0 0 M 0 0 1 1 S2 3 4 P1 1 S1 2 P2 Algorithm is faster, if this column is processed first. 1 0 −1 0 ( 0) T = −1 1 1 −1 M 1 0 0 0 M 0 1 0 0 M 0 0 1 0 M 0 0 0 1 Alternative algorithm R. Urbanczik, C. Wagner: An improved algorithm for stoichiometric network analysis: theory and applications. Bioinformatics 21 (2005) 1203-1210. First, compute nullspace matrix, K. NK = 0. Choice of K such that it contains identity matrix. Perform pair-wise combinations of columns to obtain further elementary modes. Empirically, it shows a higher performance. However, this may depend on the type of network. Software involving routines for computing elementary modes EMPATH - J. Woods METATOOL - Th. Pfeiffer, F. Moldenhauer, A. von Kamp (In versions 5.x, Wagner algorithm) GEPASI - P. Mendes JARNAC - H. Sauro In-Silico-DiscoveryTM - K. Mauch FluxAnalyzer (in MATLAB) - S. Klamt ScrumPy - M. Poolman Alternative algorithm in MATLAB – C. Wagner, R. Urbanczik PySCeS – B. Olivier et al. On-line computation: pHpMetatool - H. Höpfner, M. Lange History of pathway analysis • „Direct mechanisms“ in chemistry (Milner 1964, Happel & Sellers 1982) • Clarke 1980 „extreme currents“ • Seressiotis & Bailey 1986 „biochemical pathways“ • Leiser & Blum 1987 „fundamental modes“ • Mavrovouniotis et al. 1990 „biochemical pathways“ • Fell (1990) „linearly independent basis vectors“ • Schuster & Hilgetag 1994 „elementary flux modes“ • Liao et al. 1996 „basic reaction modes“ • Schilling, Letscher and Palsson 2000 „extreme pathways“ Robustness of metabolism • Number of elementary modes leading from a given substrate to a given product can be considered as a measure of redundancy • This characterizes flexibility - number of alternatives between which the network can switch if necessary System under study: (Work together with J. Stelling, S. Klamt, K. Bettenbrock and E.D. Gilles, Max Planck Inst., Magdeburg) • Central metabolism of Escherichia coli • 89 substances, 110 reactions • Four representative substrates: glucose, acetate, glycerol, and succinate • 0.64 protein + 0.185 RNA + 0.03 DNA + 0.1 lipids + 0.015 lipopolysaccharides + 0.015 glycogen biomass • If at least one elementary mode leads to biomass production, the mutant is predicted to be viable. Results for the E. coli model • If all four substrates present simultaneously: 507,632 elementary modes • To cope with combinatorial explosion, we allow only one substrate at a time • For example, glucose: 27,099 elem. modes • acetate: 598 elem. modes Computing the elementary modes for mutants • One enzyme gene at a time was “knocked out” in silico. • This was done for 90 different combinations of single mutants and substrates. • Comparison of our theoretical predictions on viability with experimental data from the literature. • In 81 out of the 90 cases, the predictions were correct Prediction of viability of mutants through # of elem. modes true negatives false positives false negatives true positives J. Stelling, S. Klamt, K. Bettenbrock, S. Schuster, E.D. Gilles, Metabolic network structure determines key aspects of functionality and regulation. Nature 420 (2002) 190-193 To analyse robustness more quantitatively: • Plot of #(elem. modes) vs. maximum biomass yield • For comparison, plot of #(elem. modes) vs. network diameter = average #(reactions between any two substances) Maximal growth yield (●), network diameter (○). J. Stelling, S. Klamt, K. Bettenbrock, S. Schuster, E.D. Gilles, Metabolic network structure determines key aspects of functionality and regulation. Nature 420 (2002) 190-193 Difference between redundancy and robustness A) Q1 1 2 P1 3 P2 S1 Knockout of enzyme 1 implies deletion of 2 elem. modes B) 1 S1 3 S2 Q1 2 4 P1 P2 … implies deletion of 1 elem. mode only Proposed measure of network robustness r ∑ R1 = i) ( z i =1 r⋅z r: number of reactions z: number of elem. modes zi: number of elem. modes remaining after knockout of enzyme i. T. Wilhelm, J. Behre and S. Schuster: Analysis of structural robustness of metabolic networks. IEE Proc. Syst. Biol. 1 (2004) 114 - 120. Metabolic network/ essential products Number of R1 elementary modes (robustness) R2 R3 Human erythrocyte ATP, hypoxanthine, NADPH, 2,3DPG 667 0.3834 0.3401 0.3607 Ala, Arg, Asn, His§ 667 0.5084 0.3207 0.4295 Arg, Asn, His, Ile 656 0.5211 0.3451 0.4427 Arg, Asn, Ile, Leu 567 0.5479 Arg, Asn, Leu, Pro 540 0.5360 His, Ile, Leu, Lys 802 0.5112 Ile, Leu, Pro, Val 597 0.5488 E. coli Recently generalized 0.4964 to0.4763 multiple knockouts: J. Behre, T. Wilhelm, … S.0.4586 Schuster: 0.4836 J. 0.3482 theor. Biol. 252 (2008), 0.4437 433–441. 0.4675 0.5058 Another Biochemical Application: Can sugars be produced from lipids? (Work with David Fell, Oxford) • Known in biochemistry for a long time that many bacteria and plants can produce sugars from lipids (via C2 units) while animals cannot ? Glucose AcCoA is linked with glucose by a chain of reactions. However, no elementary mode realizes this conversion along that chain. CO2 PEP Pyr AcCoA Cit Oxac CO2 IsoCit Mal CO2 OG Fum Succ SucCoA CO2 Glucose Elementary mode representing conversion of AcCoA into glucose. It requires the glyoxylate shunt. CO2 PEP AcCoA Pyr Cit Oxac CO2 Mal Mas Gly Icl IsoCit OG Fum Succ SucCoA CO2 CO2 The glyoxylate shunt is present in green plants, yeast, many bacteria (e.g. E. coli) and others and – as the only clade of animals – in nematodes. This example shows that a description by usual graphs in the sense of graph theory is insufficient… S. Schuster, D.A. Fell: Modelling and simulating metabolic networks. In: Bioinformatics: From Genomes to Therapies (T. Lengauer, ed.) Wiley-VCH, Weinheim 2007, pp. 755-805. L. Figuereido, S. Schuster, D.A. Fell: Can sugars be produced from fatty acids? Bioinformatics, under revision A successful theoretical prediction Red elementary mode: Usual TCA cycle Blue elementary mode: Catabolic pathway predicted in Liao et al. (1996) and Schuster et al. (1999) for E. coli. Glucose CO2 PEP Pyr AcCoA Cit Oxac CO2 Mal IsoCit Gly OG Fum Succ SucCoA CO2 CO2 Glucose PEP Pyr Oxac Red elementary mode: Usual TCA cycle Blue elementary mode: Catabolic pathway predicted in Liao et al. (1996) and Schuster et al. (1999). Experimental hints in Wick et al. (2001). Experimental proof in: E. Fischer and U. Sauer: CO2 A novel metabolic cycle catalyzes AcCoA glucose oxidation and anaplerosis in hungry Escherichia coli, J. Biol. Chem. 278 (2003) Cit 46446–46451 CO2 Mal IsoCit Gly OG Fum Succ SucCoA CO2 CO2 Optimization: Maximizing molar yields ATP X5P CO2 Ru5P NADPH NADP S7P E4P ADP GAP R5P PEP F6P 6PG 2PG GO6P 3PG ATP NADPH NADP Pyr ADP G6P F6P FP 2 GAP DHAP ATP NAD 1.3BPG NADH ADP ATP:G6P yield = 3 ATP:G6P yield = 2 Maximization of tryptophan:glucose yield Model of 65 reactions in the central metabolism of E. coli. 26 elementary modes. 2 modes with highest tryptophan: glucose yield: 0.451. PEP Pyr Schuster, Dandekar, Fell, Trends Biotechnol. 17 (1999) 53 Glc 233 G6P Anthr 3PG PrpP GAP 105 Trp Tryptophan Conclusions • Elementary modes are an appropriate concept to describe biochemical pathways • Information about network structure can be used to derive far-reaching conclusions about performance and robustness of metabolism • Elementary modes reflect specific characteristics of metabolic networks such as steady-state mass flow, thermodynamic constraints and and systemic interactions (Systems Biology) Conclusions (2) • It can be tested whether connected routes can carry fluxes at steady state • A complete list of potential pathways can be generated. Thereafter, experimental search for realized pathways. • Elementary modes allow one to compute - for various substrate-product pairs - the maximal yields that can potentially be achieved Cooperations • David Fell (Brookes U Oxford) • Thomas Dandekar (U Würzburg) • Steffen Klamt, Ernst Dieter Gilles (MPI Magdeburg) • Jörg Stelling (ETH Zürich) • Sebastian Bonhoeffer (ETH Zürich) • Thomas Pfeiffer (Harvard) • and others • Acknowledgement to DFG and BMBF (Germany) and FCT (Portugal) for financial support Introduction (3) • Application in Functional genomics. Considering gene products in their functional context. • Medical application: Inherited diseases caused by enzyme deficiencies • Biotechnological applications: Increaase in yield, robustness to knockouts Theoretical Methods Dynamic Simulation Stability and bifurcation analyses Metabolic Control Analysis (MCA) Metabolic Pathway Analysis Metabolic Flux Analysis (MFA) Optimization, Evolutionary Game Theory • and others • • • • • • Problem: • Many kinetic parameters unknown. Maximal velocities depend on enzyme concentrations. • What conclusions can be drawn from the information we have? Known for most enzymes: stoichiometry, reversibility Theoretical Methods Dynamic Simulation Stability and bifurcation analyses Metabolic Control Analysis (MCA) Metabolic Pathway Analysis Metabolic Flux Analysis (MFA) Optimization, Evolutionary Game Theory • and others • • • • • •