Pathway Tools / BioCyc Fundamentals Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International pkarp@ai.sri.com BioCyc.org EcoCyc.org, MetaCyc.org, HumanCyc.org 1 SRI International Bioinformatics Pathway Tools Capabilities Create and maintain an organism database integrating genome, pathway, regulatory information Computational inference tools Interactive editing tools Query and visualize that database Use the database to interpret omics data Metabolic network analysis tools Comparative analysis tools Export the metabolic network to SBML Speed creation of flux-balance models by order of magnitude 2 SRI International Bioinformatics BioCyc Hundreds of microbial genomes Inferred operons and metabolic networks Couples curated data with computational predictions Supports analysis of omics data Comparative analysis tools Microbial emphasis. Exceptions: HumanCyc, MouseCyc, CattleCyc 3 SRI International Bioinformatics Model Organism Databases / Organism Specific Databases 4 DBs that describe the genome and other information about an organism Every sequenced organism with an active experimental community requires a MOD Integrate genome data with information about the biochemical and genetic network of the organism Integrate literature-based information with computational predictions Curated by experts for that organism No one group can curate all the world’s genomes Distribute workload across a community of experts to create a community resource SRI International Bioinformatics Rationale for MODs 5 Each “complete” genome is incomplete in several respects: 40%-60% of genes have no assigned function Roughly 7% of those assigned functions are incorrect Many assigned functions are non-specific Need continuous updating of annotations with respect to new experimental data and computational predictions MODs are platforms for global analyses of an organism Interpret omics data in a pathway context In silico prediction of essential genes Characterize systems properties of metabolic and genetic networks SRI International Bioinformatics What is Curation? 6 Ongoing updating and refinement of a PGDB Correcting false-positive and false-negative predictions Incorporating information from experimental literature Authoring of comments and citations Updating database fields Gene positions, names, synonyms Protein functions, activators, inhibitors Addition of new pathways, modification of existing pathways Defining TF binding sites, promoters, regulation of transcription initiation and other processes SRI International Bioinformatics Pathway/Genome Database Pathways Reactions Proteins RNAs Genes Compounds Sequence Features Regulation Operons Promoters DNA Binding Sites Regulatory Interactions Chromosomes Plasmids CELL 7 SRI International Bioinformatics BioCyc Collection of 507 Pathway/Genome Databases Database (PGDB) – combines information about Pathways, reactions, substrates Enzymes, transporters Genes, replicons Transcription factors/sites, promoters, operons Pathway/Genome Tier 1: Literature-Derived PGDBs MetaCyc EcoCyc -- Escherichia coli K-12 Tier 2: Computationally-derived DBs, Some Curation -- 24 PGDBs HumanCyc Mycobacterium tuberculosis Tier 3: Computationally-derived DBs, No Curation -- 481 DBs 8 SRI International Bioinformatics Pathway Tools Overview Annotated Genome MetaCyc Reference Pathway DB PathoLogic Pathway/Genome Database Pathway/Genome Editors 9 Pathway/Genome Navigator SRI International Bioinformatics Pathway Tools Software: PathoLogic Computational creation of new Pathway/Genome Databases Transforms genome into Pathway Tools schema and layers inferred information above the genome Predicts operons Predicts metabolic network Predicts which genes code for missing enzymes in metabolic pathways Infers transport reactions from transporter names Bioinformatics 18:S225 2002 10 SRI International Bioinformatics Pathway Tools Software: Pathway/Genome Editors Interactively update PGDBs with graphical editors Support geographically distributed teams of curators with object database system Gene editor Protein editor Reaction editor Compound editor Pathway editor Operon editor Publication editor 11 SRI International Bioinformatics Pathway Tools Software: Pathway/Genome Navigator Querying and visualization of: Pathways Reactions Metabolites Proteins Genes Chromosomes Two modes of operation: Web mode Desktop mode Most functionality shared, but each has unique functionality 12 SRI International Bioinformatics Pathway Tools Software: PGDBs Created Outside SRI 1,700+ licensees: 75+ groups applying software to 300+ organisms Saccharomyces cerevisiae, SGD project, Stanford University 135 pathways / 565 publications Candida albicans, CGD project, Stanford University dictyBase, Northwestern University Mouse, MGD, Jackson Laboratory Under development: Drosophila, FlyBase C. elegans, WormBase Arabidopsis thaliana, TAIR, Carnegie Institution of Washington 288 pathways / 2282 publications PlantCyc, Carnegie Institution of Washington Six Solanaceae species, Cornell University GrameneDB, Cold Spring Harbor Laboratory Medicago truncatula, Samuel Roberts Noble Foundation 13 SRI International Bioinformatics Pathway Tools Software: PGDBs Created Outside SRI NIAID BRCs for Biodefense pathogens: BioHealthBase -- Mycobacterium tuberculosis, Francisella tuleremia Pathema -- 80+ PGDBs PATRIC – Brucella suis, Coxiella burnetii, Rickettsia typhi EuPathDB – Cryptosporidium, Plasmodium G. Xie, Los Alamos Lab, Dental pathogens F. Brinkman, Simon Fraser Univ, Pseudomonas aeruginosa V. Schachter, Genoscope, Acinetobacter M. Bibb, John Innes Centre, Streptomyces coelicolor G. Church, Harvard, Prochlorococcus marinus, multiple strains E. Uberbacher, ORNL and G. Serres, MBL, Shewanella onedensis R.J.S. Baerends, University of Groningen, Lactococcus lactis IL1403, Lactococcus lactis MG1363, Streptococcus pneumoniae TIGR4, Bacillus subtilis 168, Bacillus cereus ATCC14579 Matthew Berriman, Sanger Centre, Trypanosoma brucei, Leishmania major Sergio Encarnacion, UNAM, Sinorhizobium meliloti Mark van der Giezen, University of London, Entamoeba histolytica, Giardia intestinalis Michael Gottfert, Technische Universitat Dresden, Bradyrhizobium japonicum Artiva Maria Goudel, Universidade Federal de Santa Catarina, Brazil, Chromobacterium violaceum ATCC 12472 14 SRI International Bioinformatics Pathway Tools Software: PGDBs Created Outside SRI Large scale users: C. Medigue, Genoscope, 200+ PGDBs G. Sutton, J. Craig Venter Institute, 80+ PGDBs G. Burger, U Montreal, 60+ PGDBs Bart Weimer, Utah State University, Lactococcus lactis, Brevibacterium linens, Lactobacillus acidophilus, Lactobacillus plantarum, Lactobacillus johnsonii, Listeria monocytogenes Partial 15 listing of outside PGDBs at BioCyc.org SRI International Bioinformatics Obtaining a PGDB for Organism of Interest Find existing curated PGDB Find existing PGDB in BioCyc Create 16 your own SRI International Bioinformatics EcoCyc Project – EcoCyc.org E. coli Encyclopedia Review-level Model-Organism Database for E. coli Tracks evolving annotation of the E. coli genome and cellular networks The two paradigms of EcoCyc “Multi-dimensional annotation of the E. coli K-12 genome” Positions of genes; functions of gene products – 76% / 66% exp Gene Ontology terms; MultiFun terms Gene product summaries and literature citations Evidence codes Multimeric complexes Metabolic pathways Cellular regulation Karp, Gunsalus, Collado-Vides, Paulsen Nuc. Acids Res. 35:7577 2007 17 ASM News 70:25 2004 Science 293:2040 SRI International Bioinformatics URL: EcoCyc.org EcoCyc = E.coli Dataset + Pathway/Genome Navigator Pathways: 246 EcoCyc v13.6 Reactions: Metabolic: 1394 Transport: 246 Compounds: 1,830 Citations: 19,000 Proteins: 4,479 Complexes: 895 RNAs: 285 Genes: 4,492 18 Gene Regulation: Operons: 3,369 Trans Factors: 196 Promoters: 1,796 TF Binding Sites: 2,205 SRI International Bioinformatics Paradigm 1: EcoCyc as Textual Review Article All gene products for which experimental literature exists are curated with a minireview summary Found on protein and RNA pages, not gene pages! 3257 gene products contain summaries Summaries cover function, interactions, mutant phenotypes, crystal structures, regulation, and more Additional summaries found in pages for operons, pathways EcoCyc 19 cites 17,300 publications SRI International Bioinformatics Paradigm 2: EcoCyc as Computational Symbolic Theory Highly structured, high-fidelity knowledge representation provides computable information Each molecular species defined as a DB object Genes, proteins, small molecules Each molecular interaction defined as a DB object Metabolic reactions Transport reactions Transcriptional regulation of gene expression 220 database fields capture extensive properties and relationships 20 SRI International Bioinformatics EcoCyc Procedures DB updates performed by 5 staff curators Information gathered from biomedical literature 21 Enter data into structured database fields Author extensive summaries Update evidence codes Corrections submitted by E. coli researchers Four releases per year Quality assurance of data and software Evaluate database consistency constraints Perform element balancing of reactions Run other checking programs SRI International Bioinformatics EcoCyc Accelerates Science 22 Experimentalists E. coli experimentalists Experimentalists working with other microbes Analysis of expression data Computational biologists Biological research using computational methods Genome annotation Study connectivity of E. coli metabolic network Study phylogentic extent of metabolic pathways and enzymes in all domains of life Bioinformaticists Training and validation of new bioinformatics algorithms – predict operons, promoters, protein functional linkages, protein-protein interactions, Metabolic engineers “Design of organisms for the production of organic acids, amino acids, ethanol, hydrogen, and solvents “ Educators SRI International Bioinformatics MetaCyc: Metabolic Encyclopedia Describe a representative sample of every experimentally determined metabolic pathway Describe properties of metabolic enzymes Literature-based DB with extensive references and commentary Pathways, reactions, enzymes, substrates Jointly developed by P. Karp, R. Caspi, C. Fulcher, SRI International L. Mueller, A. Pujar, Boyce Thompson Institute S. Rhee, P. Zhang, Carnegie Institution Nucleic Acids Research 2008 23 SRI International Bioinformatics Applications of MetaCyc Reference source on metabolic pathways Metabolic engineering Find enzymes with desired activities, regulatory properties Determine cofactor requirements Predict pathways from genomes Systematic studies of metabolism Computer-aided 24 education SRI International Bioinformatics MetaCyc Data -- Version 13.6 25 Pathways 1,436 Reactions 8,200 Enzymes 6,060 Small Molecules 8,400 Organisms 1,800 Citations 21,700 SRI International Bioinformatics Taxonomic Distribution of MetaCyc Pathways – version 13.1 26 Bacteria 883 Green Plants 607 Fungi 199 Mammals 159 Archaea 112 SRI International Bioinformatics Enzyme Data Available in MetaCyc Reaction(s) catalyzed Alternative substrates Activators, inhibitors, cofactors, prosthetic groups Subunit structure Genes Features on protein sequence Cellular location pI, molecular weight, Km, Vmax Gene Ontology terms Links to other bioinformatics databases 30 SRI International Bioinformatics What is a Pathway? A connected sequence of biochemical reactions Occurs in one organism Conserved through evolution Regulated as a unit Often starts or stops at one of 13 common intermediate metabolites 31 SRI International Bioinformatics MetaCyc Pathway Variants Pathways that accomplish similar biochemical functions using different biochemical routes Alanine biosynthesis I – E. coli Alanine biosynthesis II – H. sapiens Pathways that accomplish similar biochemical functions using similar sets of reactions Several variants of TCA Cycle 32 SRI International Bioinformatics MetaCyc Super-Pathways 33 Groups of pathways linked by common substrates Example: Super-pathway containing Chorismate biosynthesis Tryptophan biosynthesis Phenylalanine biosynthesis Tyrosine biosynthesis Super-pathways defined by listing their component pathways Multiple levels of super-pathways can be defined Pathway layout algorithms accommodate super-pathways SRI International Bioinformatics Comparison with KEGG KEGG vs MetaCyc: Reference pathway collections KEGG maps are not pathways Nuc Acids Res 34:3687 2006 35 KEGG maps contain multiple biological pathways Two genes chosen at random from a BioCyc pathway are more likely to be related according to genome context methods than from a KEGG pathway KEGG maps are composites of pathways in many organisms -- do not identify what specific pathways elucidated in what organisms KEGG has no literature citations, no comments, less enzyme detail KEGG assigns half as many reactions to pathways as MetaCyc KEGG vs organism-specific PGDBs KEGG does not curate or customize pathway networks for each organism Highly curated PGDBs now exist for important organisms such as E. coli, yeast, mouse, Arabidopsis SRI International Bioinformatics Comparison of Pathway Tools to KEGG Inference tools KEGG does not predict presence or absence of pathways KEGG lacks pathway hole filler, operon predictor Curation tools KEGG does not distribute curation tools No ability to customize pathways to the organism Pathway Tools schema much more comprehensive Visualization and analysis KEGG does not perform automatic pathway layout KEGG metabolic-map diagram extremely limited No comparative pathway analysis 36 SRI International Bioinformatics Pathway Tools Implementation Details 37 Platforms: Macintosh, PC/Linux, and PC/Windows platforms Same binary can run as desktop app or Web server Production-quality software Version control Two regular releases per year Extensive quality assurance Extensive documentation Auto-patch Automatic DB-upgrade 480,000 lines of Lisp code SRI International Bioinformatics ptools-support@ai.sri.com 38 SRI International Bioinformatics Pathway Tools Architecture Web Mode Lisp Perl Java Disk File 39 Pathway Genome Navigator GFP API Desktop Mode Protein Editor Pathway Editor Reaction Editor Ocelot DBMS SRI International Bioinformatics Oracle or MySQL Ocelot Knowledge Server Architecture Frame data model Minimizes size of schema relative to semantic complexity Schema is stored within the DB Schema is self documenting Slot units define metadata about slots Domain, range, inverse Collection type, number of values, value constraints Comment 40 Schema evolution facilitated by Easy addition/removal of slots, or alteration of slot datatypes Flexible data formats that do not require dumping/reloading of data SRI International Bioinformatics Ocelot Storage System Architecture Persistent storage via disk files or Oracle or MySQL Concurrent development: Oracle or MySQL Single-user development: disk files Oracle/MySQL DBMS storage DBMS is submerged within Ocelot, invisible to users Frames transferred from DBMS to Ocelot 41 On demand By background prefetcher Memory cache Persistent disk cache to speed performance via Internet Transaction logging facility SRI International Bioinformatics Why Do We Code in Common Lisp? Gatt studied Lisp and Java implementation of 16 programs by 14 programmers (Intelligence 11:21 2000) The average Lisp program ran 33 times faster than the average Java program The average Lisp program was written 5 times faster than the average Java program Roberts compared Java and Lisp implementations of a Domain Name Server (DNS) resolver 42 http://www.findinglisp.com/papers/case_study_java_lisp_dns.html The Lisp version had ½ as many lines as code SRI International Bioinformatics Common Lisp Programming Environment Interpreted and/or compiled execution Fabulous debugging environment High-level language Interactive data exploration Extensive built-in libraries Dynamic redefinition Find out more! See ALU.org or http://www.international-lisp-conference.org/ 43 SRI International Bioinformatics PathoLogic Processing 1. 2. 3. 4. 5. 6. 44 Translate source genome to PGDB form Predict operons Predict metabolic pathways Predict pathway hole fillers Transport inference parser Build metabolic overview diagram SRI International Bioinformatics PathoLogic Step 1: Translate Genome to PGDB Annotated Genomic Sequence Pathway/Genome Database Gene Products Pathways Genes/ORFs DNA Sequences Multi-organism Pathway Database (MetaCyc) Pathways Reactions PathoLogic Software Integrates genome and pathway data to identify putative metabolic networks Compounds Gene Products Genes Reactions Genomic Map Compounds 45 SRI International Bioinformatics PathoLogic Step 2: Predict Operons Predict adjacent genes A and B in same operon based on: Intragenic distance Functional relatedness of A and B Tests for functional relatedness: A and B in same gene functional class (MultiFun) A and B in same metabolic pathway A codes for enzyme in a pathway and B codes for transporter involving a substrate in that pathway A and B are monomers in same protein complex Correctly predicts 80% of E. coli transcription units Marks predicted operons with computational evidence codes Bioinformatics 20:709-17 2004 47 SRI International Bioinformatics PathoLogic Step 3: Prediction of Metabolic Pathways Infer reaction complement of organism Match enzymes in source genome to MetaCyc reactions they catalyze Match enzyme names and EC numbers to MetaCyc Support user in manually matching additional enzymes Computationally predict which MetaCyc metabolic pathways are present For each MetaCyc pathway, evaluate which of its reactions are catalyzed by the organism 48 SRI International Bioinformatics Match Enzymes to Reactions 5.1.3.2 Gene product MetaCyc UDP-glucose-4epimerase 2057 proteins matched by EC# 314 matched by name Match no yes Probable enzyme 1320 Assign -ase UDP-D-glucose UDP-galactose no yes Not a metabolic enzyme Manually search no Can’t Assign 49 yes Assign SRI International Bioinformatics 625 Import Pathways reactions Containing MetaCyc pathways Import All Prune? yes 50 keep no Manual Review yes Delete no delete SRI International Bioinformatics Pathway Prediction Prediction is hard because Enzyme naming is irregular Some reactions present in multiple pathways Pathway variants share many reactions in common MetaCyc now has many pathways 51 SRI International Bioinformatics Pathway Scoring Criteria Imported pathways must satisfy: Pathways outside their taxonomic range must have enzymes for all reactions If any reactions in a pathway are designated as “key,” an enzyme must be present for at least one Pathway P is imported if any conditions satisfied: One unique enzyme present for P P missing at most one reaction More reactions present than absent for P P is not a superset of another pathway with the same number of enzymes present 52 SRI International Bioinformatics Pathway Evidence Report 53 SRI International Bioinformatics PathoLogic Step 4: Pathway Hole Filler Definition: Pathway Holes are reactions in metabolic pathways for which no enzyme is identified L-aspartate 1.4.3.- iminoaspartate quinolinate synthetase nadA quinolinate holes NAD+ synthetase, NH3 dependent CC3619 deamido-NAD n.n. pyrophosphorylase nadC 2.7.7.18 NAD 54 nicotinate nucleotide 6.3.5.1 SRI International Bioinformatics Step 2: BLAST against target genome gene X Step 1: Query UniProt for all sequences having EC# of pathway hole Step 3 & 4: Consolidate hits and evaluate evidence organism 1 enzyme A organism 2 enzyme A organism 3 enzyme A organism 4 enzyme A gene Y organism 5 enzyme A 7 queries have high-scoring hits to sequence Y organism 6 enzyme A organism 7 enzyme A organism 8 enzyme A gene Z 55 SRI International Bioinformatics Pathway Hole Filler Why should hole filler find things beyond the original genome annotation? Reverse BLAST searches more sensitive Reverse BLAST searches find second domains Integration of multiple evidence types 57 SRI International Bioinformatics Caulobacter crescentus Pathway Holes 130 pathways containing 582 reactions 92 pathways contain 236 pathway holes Caulobacter holes filled: 77 holes filled at P >0.9 Previous functions of candidate hole fillers: No predicted function Correctly assigned single function Incorrectly assigned function Imprecise functional assignment BMC Bioinformatics 5:76 2004 58 SRI International Bioinformatics Example Pathway CC2913, P=0.99 L-aspartate 1.4.3.- iminoaspartate quinolinate synthetase nadA (CC2912) quinolinate holes NAD+ synthetase, NH3 dependent CC3619 deamido-NAD n.n. pyrophosphorylase nadC (CC2915) 2.7.7.18 6.3.5.1 NAD nicotinate nucleotide CC3431*, P=0.90 CC3619, P=0.99 CC2913 L-aspartate oxidase (wrong EC# on rxn) CC3431 ORF CC3619 put. NAD(+)-synthetase (multidomain) 59 SRI International Bioinformatics PathoLogic Step 5: Transport Inference Parser Problem: Write a program to query a genome annotation to compute the substrates an organism can transport Typical 60 genome annotations for transporters: ATP transporter for ribose ribose ABC transporter D-ribose ATP transporter ABC transporter, membrane spanning protein [ribose] ABC transporter, membrane spanning protein [D-ribose] SRI International Bioinformatics Transport Inference Parser Input: “ATP transporter of phosphonate” Output: Structured description of transport activity Locates most transporters in genome annotation using keyword analysis Parse product name using a series of rules to identify: Transported substrate, co-substrate Influx/efflux Energy coupling mechanism Creates transport reaction object: phosphonate[periplasm] + H2O + ATP = phosphonate + Pi + ADP 61 SRI International Bioinformatics Transport Inference Parser Permits 62 symbolic computation with transport activities: Compute transportable substrates of the cell Compute connectivity among compartments for substrates Facilitate reasoning about transport/metabolism connections Draw transport cartoon in protein pages, cellular overview SRI International Bioinformatics Transport Inference Parser User reviews all assignments using interactive tool that allows assignments to be revised User also reviews transporters for which no assignment was made 63 SRI International Bioinformatics Regulation 64 SRI International Bioinformatics Encoding Cellular Regulation in Pathway Tools -- Goals Facilitate curation of wide range of regulatory information within a formal ontology Compute with regulatory mechanisms and pathways Summary statistics, complex queries Pattern discovery Visualization of network components Provide training sets for inference of regulatory networks Interpret gene-expression datasets in the context of known regulatory mechanisms 65 SRI International Bioinformatics Regulatory Interactions Supported by Pathway Tools Substrate-level regulation of enzyme activity Binding to proteins or small molecules (phosphorylation) Regulation of transcription initiation Attenuation of transcription Regulation of translation by proteins and by small RNAs 66 SRI International Bioinformatics Regulation in Pathway Tools Editing tools Transcription factor display window Transcription unit display window Regulatory 67 Overview / Omics Viewer SRI International Bioinformatics Regulatory Interaction Editor 68 SRI International Bioinformatics Regulatory Overview and Omics Viewer Show regulatory relationships among gene groups 69 SRI International Bioinformatics Comparative Analysis Via Cellular Overview Comparative genome browser Comparative pathway table Comparative analysis reports Compare reaction complements Compare pathway complements Compare transporter complements 71 SRI International Bioinformatics Information Sources 73 Pathway Tools User’s Guide aic-export/pathway-tools/ptools/13.0/doc/manuals/userguide.pdf NOTE: Location of the aic-export directory can vary across different computers Pathway Tools Web Site http://bioinformatics.ai.sri.com/ptools/ Publications, FAQ, programming examples, etc. Slides from this tutorial http://www.ai.sri.com/pkarp/talks/ BioCyc Webinars http://biocyc.org/webinar.shtml SRI International Bioinformatics BioCyc and Pathway Tools Availability BioCyc.org Web site and database files freely available to all Pathway Tools freely available to non-profits Macintosh, PC/Windows, PC/Linux 74 SRI International Bioinformatics Symbolic Systems Biology Definition: Global analyses of biological systems using symbolic computing 75 SRI International Bioinformatics Symbolic Systems Biology 76 “Symbolic computing is concerned with the representation and manipulation of information in symbolic form. It is often contrasted with numeric representation.” -- R. Cameron Examples of symbolic computation: Symbolic algebra programs, e.g., Mathematica, Graphing Calculator Compilers and interpreters for programming languages Database query languages Text analysis programs, e.g., Google String matching for DNA and protein sequences Artificial Intelligence methods, e.g., expert systems, symbolic logic, machine learning, natural language understanding SRI International Bioinformatics Symbolic Systems Biology Concerned with different questions than quantitative systems biology Symbolic analyses can in many cases produce answers when quantitative approaches fail because of lack of parameters or intractable mathematics Symbolic computation is intimately dependent on the use of structured ontologies 77 SRI International Bioinformatics Pathway Tools Ontology 1064 classes Main classes such as: 78 Pathways, Reactions, Compounds, Macromolecules, Proteins, Replicons, DNA-Segments (Genes, Operons, Promoters) Taxonomies for Pathways, Reactions, Compounds 205 slots Meta-data: Creator, Creation-Date Comment, Citations, Common-Name, Synonyms Attributes: Molecular-Weight, DNA-Footprint-Size Relationships: Catalyzes, Component-Of, Product Classes, instances, slots all stored side by side in DBMS SRI International Bioinformatics Critiquing the Parts List 79 Slide thanks to Hirotada Mori (minus the banana!) SRI International Bioinformatics Dead End Metabolites A 80 small molecule C is a dead-end if: C is produced only by SMM reactions in Compartment, and no transporter acts on C in Compartment OR C is consumed only by SMM reactions in Compartment, and no transporter acts on C in Compartment SRI International Bioinformatics Dead End Metabolites Not yet an official part of Pathway Tools Contact us if you’d like to use it 81 SRI International Bioinformatics Reachability Analysis of Metabolic Networks Given: A PGDB for an organism A set of initial metabolites Infer: What set of products can be synthesized by the small-molecule metabolism of the organism Motivations: Quality control for PGDBs Verify that a known growth medium yields known essential compounds Experiment with other growth media Experiment with reaction knock-outs Limitations Cannot properly handle compounds required for their own synthesis Nutrients needed for reachability may be a superset of those required for growth Romero and Karp, Pacific Symposium on Biocomputing, 2001 82 SRI International Bioinformatics Algorithm: Forward Propagation Through Production System Each reaction becomes a production rule Each of the 21 metabolites in the nutrient set becomes an axiom Nutrient set Products Metabolite pool PGDB reaction set “Fire” reactions A+BC Reactants 83 SRI International Bioinformatics Nutrients: A, B, C, E, F A+BW C+DX E+FY W+YZ Produced Compounds: W, Y, Z 84 SRI International Bioinformatics Initial Metabolite Nutrient Set (Total: 21 compounds) Nutrients (8) (M61 Minimal growth medium) Nutrients (10) (Environment) Bootstrap Compounds (3) 85 H+, Fe2+, Mg2+, K+, NH3, SO42-, PO42-, Glucose Water, Oxygen, Trace elements (Mn2+, Co2+, Mo2+, Ca2+, Zn2+, Cd2+, Ni2+, Cu2+) ATP, NADP, CoA SRI International Bioinformatics Essential Compounds E. coli Total: 41 compounds Proteins (20) Amino acids Nucleic acids (DNA & RNA) (8) Nucleosides Cell membrane (3) Phospholipids Cell wall (10) Peptidoglycan precursors Outer cell wall precursors (Lipid-A, oligosaccharides) 86 SRI International Bioinformatics 87 SRI International Bioinformatics http://brg.ai.sri.com/ptools09/slides/Tuesday/growt h-experiment-Markus-Krummenacker.txt 88 SRI International Bioinformatics Flux Balance Modeling 89 Generate, store, and update metabolic model within Pathway Tools Fast, accurate generation of metabolic model Close coupling to genome and regulatory information Extensive schema Extensive query and visualization tools Debug/validate model using Pathway Tools Export to SBML and import to constraint solver for model execution Visualize reaction flux and omics data using overviews Copy/update multiple PGDBs to reflect alternative strains SRI International Bioinformatics