Metabolic networks John Pinney Theoretical Systems Biology group j.pinney@imperial.ac.uk 341 Introduction to Bioinformatics: Biological Networks 25th February 2010 Part 1: Constructing metabolic networks What is metabolism? “Metabolism is the set of chemical reactions that occur in living organisms in order to maintain life.” Image: section through an Escherichia coli cell by David Goodsell What is metabolism? Key classes of biochemicals: amino acids • proteins carbohydrates • bacterial envelope nucleotides • genetic material lipids • membranes coenzymes • transfer chemical groups minerals • assist in biochemical transformations Enzymes Metabolic reactions are catalysed by proteins called enzymes. glucose glucose 6-phosphate Metabolic pathways Traditionally, biochemists consider a series of consecutive metabolic reactions to form a pathway. Image: CK12.org Metabolic networks However, pathways often overlap so much that it is more accurate to consider the set of all metabolic reactions as forming a network. Image: Wikipedia How should we represent metabolic networks? Traditional textbook representation: Compounds are shown as boxes. Arrows connect compounds to show interconversions. Arrows are labelled with the name of the associated enzyme. Cofactors (commonly-used compounds) included with curved arrows. Image: Michal, G. (1993). Biochemical Pathways Poster. Boehringer Mannheim GmbH Why should we study metabolic networks? Fundamental to life Since enzymes are encoded in the genome, metabolism is one mechanism by which an organism’s genotype (specific set of genes) is connected to its phenotype (how it behaves). Many metabolic processes are common to all forms of life. Biotechnology Deep understanding of the metabolic networks of bacteria is needed if they are to be genetically modified to produce a desired product with maximum yields. Medicine Aberrations in human metabolism are fundamental to diseases such as diabetes and some types of cancer. Knowledge of the metabolic networks of pathogens and parasites can help to select drug targets (or target combinations) that will be most effective. How should we represent metabolic networks? Traditional textbook representation: Compounds are shown as boxes. Arrows connect compounds to show interconversions. Arrows are labelled with the name of the associated enzyme. Cofactors (commonly-used compounds) included with curved arrows. Image: Michal, G. (1993). Biochemical Pathways Poster. Boehringer Mannheim GmbH Representing metabolic networks for systems biology simple graph metabolite digraph bipartite digraph or more complex still..? reaction enzyme Metabolic reconstruction Task: Given the genome sequence for an organism, find its metabolic network. Resources: Sequence databases Genome annotations Databases of metabolic reactions Tools: Sequence similarity searches Text extraction Machine learning Experimental data (high- and low-throughput) Francke C et al. (2005) Metabolic reconstruction from a genome annotation For well-studied organisms, a great deal of information about metabolism is already known. Genome annotations label each gene with our current knowledge. EC 5.3.1.9 glucose-6-phosphate isomerase Enzymatic functions are often described in such annotations using the E.C. (Enzyme Commission) hierarchical numbering system. 5 => isomerase 5.3 => intramolecular oxidorecuctases 5.3.1 => interconverting aldoses and ketoses Metabolic reconstruction from a genome annotation Once a set of enzymes has been collected, they can simply be projected onto a database of all known metabolic reactions to give a “first-pass” network reconstruction. e.g. glycolysis / gluconeogenesis for chicken, Gallus gallus, taken from KEGG (Kyoto Encyclopedia of Genes and Genomes) www.genome.jp/kegg Metabolic reconstruction from a proteome Often a well-curated genome annotation is unavailable, but we have a good idea of where the protein-coding genes are on the genome so can extract a predicted proteome (set of all protein sequences encoded by the genome). The task is now to assign enzymatic functions to these protein sequences. genome sequence with known protein-coding regions. predicted proteins Metabolic reconstruction from a proteome If a closely-related organism has a good annotation, it may be possible to identify orthologous (i.e. functionally equivalent) proteins using basic sequence alignment methods such as BLAST. More sophisticated methods for orthology assignment are also available. annotated proteome new proteome Functional assignment by sequence similarity (e.g. BLAST) Metabolic reconstruction from a proteome However, using profile models for enzyme domains is a more sensitive way to detect sequence similarities, especially across large evolutionary distances. Highly-conserved amino acids multiple alignment of enzyme domains from many species profile model (position-specific scoring matrix / profile HMM) library of models for all enzyme functions with known sequences Metabolic reconstruction from a proteome Known ligand-binding residues from bacterial structure EPSP synthase ATP/GTP binding motif shikimate kinase McConkey GA et al. (2004) Limitations of sequence-based methods Large evolutionary distances Transfer of function from a distant sequence may not be reliable. Enzyme may be too divergent to be recognised from sequence. Multiple functions Some enzymes have multiple protein domains that have different functions. An enzyme may “moonlight” - i.e. catalyse several different reactions using the same active site. Reactions with unknown sequences There are several known metabolic reactions for which no example enzyme sequences are known. Unknown reactions Across all kingdoms of life, there are many hundreds of metabolic reactions that are as yet completely uncharacterised! Manual curation Computational assignment of gene function is not 100% accurate! It will always be important to examine and refine initial automated metabolic reconstructions carefully before attempting to analyse the resulting network. Comparative genomics can be a powerful tool in network curation. By comparing genomes between different species, we attempt to use their shared evolutionary histories to help us identify gene functions more accurately. What genes are close to this gene? Has this gene ever fused with another one? Which genes tend to be present in the same organisms as this one? Which genes control whether this one is switched on? What experimental evidence is there? Gaps in a reconstructed network Even after curation, a network may still contain obvious gaps, also known as pathway holes. consumed but not produced source intermediate reaction missing produced but not consumed sink Methods for gap-filling Phylogenetic profiling (evidence for functionally associated genes) Anticorrelation analysis (evidence for functionally analogous genes) gene species g1 s1 + s2 + s3 g2 ? g3 g4 g5 g6 + + + + + + + + + + + + + s5 + + + s6 s8 + ? + g8 g9 + + + + shared pattern Osterman A and Overbeek R (2003); Pellegrini M et al. (1999) + + + + + + + + + g10 + + s4 s7 g7 + + anticorrelated pattern + Methods for gap-filling Evidence from various sources can be integrated using machine learning to give an overall likelihood that a particular gene might fill a particular pathway hole. For parasitic or symbiotic organisms, we also need to consider the possibility of metabolite exchange with the host or subversion of host enzymes. Green ML et al. (2004) Part 2: Metabolic network analysis Analysis of metabolic networks Metabolic networks can be analysed on several different levels. Topologically Basic network structure Stoichiometrically Considering the numbers of molecules of each type consumed and produced by each reaction. Dynamically Considering the rates of each reaction and variations in metabolite concentrations over time. Topological analysis Metabolic networks can be studied purely from the point of view of their graph properties. Degree distribution Clustering coefficient Shortest path length Modularity etc. These types of investigations may (or may not!) provide useful insights into how metabolic networks have evolved. Wagner A and Fell DA (2001) Topological analysis Chokepoint analysis can help to reveal potential drug targets highlighted squares are all chokepoint reactions, as they have unique substrates and/or products Yeh I et al. (2004) Petri net representations The bipartite digraph representation of a metabolic network is very close to a modelling paradigm from computer science called a Petri net. Various forms of Petri net representation have been successfully used in the analysis of many biological networks, especially for gene regulation, signal transduction and metabolic systems. bipartite digraph metabolite reaction Petri net Petri nets for metabolic systems Image: I. Barjis and V. Gehlot, SCSC 2007 Petri Nets A tool for modelling a system: • • • • • simple. easy to represent graphically. represents concurrent processes. mathematically rigorous. large theoretical framework has been developed. Peterson JL (1981) Petri Net theory and the modeling of systems Prentice-Hall, NJ Introduction to Petri Nets Generic features of a system Composite: • A system is considered to be made up of separate, interacting components. State: • Each component has its own state of being, which determines its future actions. Concurrency: • Components in two or more parts of the system may be simultaneously active. Introduction to Petri Nets Petri nets are usually described mathematically using matrix notation. place arc transition However, they can also be represented as directed graphs with two types of node: places and transitions. Introduction to Petri Nets input place output place Transitions Each transition has a set of input places and a set of output places. Introduction to Petri Nets marked places Places Places may be marked by tokens. Each place may hold an integer number of tokens. A particular distribution of tokens over a net is called a marking. This represents the state of the system. Introduction to Petri Nets enabled transitions Firing transitions Transitions whose input places are all marked by at least one token are said to be enabled. A transition fires by removing one token from each of its input places and creating new tokens at its output places. Introduction to Petri Nets Firing transitions Transitions whose input places are all marked by at least one token are said to be enabled. A transition fires by removing one token from each of its input places and creating new tokens at its output places. Introduction to Petri Nets Firing transitions Transitions whose input places are all marked by at least one token are said to be enabled. A transition fires by removing one token from each of its input places and creating new tokens at its output places. Introduction to Petri Nets Firing transitions Firing may continue until no transition is enabled, at which point execution halts. Although the initial marking determines the possible future behaviour of the net, the order in which transitions are fired is not fixed: the same initial marking may lead to different final states. Introduction to Petri Nets Firing transitions Firing may continue until no transition is enabled, at which point execution halts. Although the initial marking determines the possible future behaviour of the net, the order in which transitions are fired is not fixed: the same initial marking may lead to different final states. Introduction to Petri Nets Firing transitions Firing may continue until no transition is enabled, at which point execution halts. Although the initial marking determines the possible future behaviour of the net, the order in which transitions are fired is not fixed: the same initial marking may lead to different final states. Introduction to Petri Nets Firing transitions Firing may continue until no transition is enabled, at which point execution halts. Although the initial marking determines the possible future behaviour of the net, the order in which transitions are fired is not fixed: the same initial marking may lead to different final states. Introduction to Petri Nets Firing transitions Firing may continue until no transition is enabled, at which point execution halts. Although the initial marking determines the possible future behaviour of the net, the order in which transitions are fired is not fixed: the same initial marking may lead to different final states. Introduction to Petri Nets Firing transitions Firing may continue until no transition is enabled, at which point execution halts. Although the initial marking determines the possible future behaviour of the net, the order in which transitions are fired is not fixed: the same initial marking may lead to different final states. Introduction to Petri Nets Firing transitions Firing may continue until no transition is enabled, at which point execution halts. Although the initial marking determines the possible future behaviour of the net, the order in which transitions are fired is not fixed: the same initial marking may lead to different final states. Matrix notation for Petri nets Stoichiometric analysis Elementary Flux Modes are formal definitions of minimal pathways that can operate independently at steady state. They are equivalent to the set of minimal T-invariants of the Petri net incidence matrix describing the system. Schuster S et al. (1999) Part of E. coli metabolism Stoichiometric analysis Schuster S et al. (1999) Stoichiometric analysis Flux balance analysis (FBA) is a widely used stoichiometric analysis technique. For a given growth condition (e.g. known input nutrients): Assume that metabolic system operates in a steady state. Assume certain constraints on system (mass-balance, flux limitations). Assume an “objective” that is expected to be maximised by evolution (e.g. biomass production). FBA can be used to predict reaction fluxes and essential enzymes under a given growth condition. FBA example anoxic (no oxygen) Grafahrend-Belau E et al. (2008) hypoxic (limited oxygen) aerobic (unlimited oxygen) Pathways of starch storage at different phases of development in barley seeds Metabolic control analysis Given kinetic parameters, we can calculate sensitivity of the flux through a given pathway to the inhibition of any enzyme involved. This replaces the concept of a “rate-limiting step” in a pathway with the idea of control being shared to some degree between all enzymes, represented by each enzyme’s flux control coefficient, C. Requires detailed kinetic model: currently limited to a few very well characterised pathways in specific organisms. C=1 0<C<1 C=0 Bakker BM et al. (2000) Metabolic control analysis The human trypanosome parasite Trypanosoma brucei has a unique organelle called the glycosome, which carries out the glycoloysis that is essential for its survival. MCA has been applied to the glycolytic pathway in T. brucei to determine which of these enzymes would be the best drug targets. MCA is potentially very helpful in drug target investigations because it allows us to consider the likely effects of incomplete inhibition of enzyme function. Bakker BM et al. (2000) Dynamic modelling approaches There are many general software packages available for systems biology that can be used to model and simulate the dynamic behaviour of metabolic networks and to integrate them with processes such as gene regulation and protein interactions. Metabolic models can often be shared between different software using Systems Biology Markup Language (SBML). (see sbml.org for examples) Modelling could be Deterministic or Stochastic e.g. ordinary differential equations (ODEs) e.g. Gillespie algorithm, Petri net simulation Systems Biology Markup Language Summary Metabolic networks are central to much of systems biology and have important applications in biotechnology and medicine. They can be reconstructed to some extent from genome sequences, but a complete and accurate metabolic model is difficult to achieve and requires a great deal of manual curation. Metabolic networks may be analysed at various degrees of detail, using topological, stoichiometric and/or dynamic approaches. References •Oberhardt MA et al. Applications of genome-scale metabolic reconstructions. Mol Syst Biol (2009) 5:320 •Francke C et al. Reconstructing the metabolic network of a bacterium from its genome. Trends Microbiol (2005) 13:550-8 •Bakker BM et al. Metabolic control analysis of glycolysis in trypanosomes as an approach to improve selectivity and effectiveness of drugs. Molecular and Biochemical Parasitology (2000) 106:1-10 •Grafahrend-Belau E et al. Flux balance analysis of barley seeds: a computational approach to study systemic properties of central metabolism. Plant Physiol (2008) •Green ML et al. A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases. BMC Bioinformatics (2004) 5:76 •McConkey GA et al. Annotating the Plasmodium genome and the enigma of the shikimate pathway. Trends Parasitol (2004) 20:60-5 •Osterman A and Overbeek R. Missing genes in metabolic pathways: a comparative genomics approach. Current Opinion in Chemical Biology (2003) 7:238-51 •Pellegrini M et al. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA (1999) 96:4285-8 •Schuster S et al. Detection of elementary flux modes in biochemical networks: a promising tool for pathway analysis and metabolic engineering. Trends Biotechnol (1999) 17:53-60 •Wagner A and Fell DA. The small world inside large metabolic networks. Proc Biol Sci (2001) 268:1803-10 •Yeh I et al. Computational analysis of Plasmodium falciparum metabolism: organizing genomic information to facilitate drug discovery. Genome Res (2004) 14:917-24