E. coli - stamps

advertisement
Metabolomics Applications of the BioCyc
Databases and Pathway Tools Software
Peter D. Karp
SRI International
ecocyc.org
biocyc.org
metacyc.org
© 2014 SRI International
Overview
• Overview of MetaCyc family of Pathway/Genome Databases (PGDBs)
– BioCyc collection: EcoCyc, MetaCyc, HumanCyc, etc
– Curated PGDBs for Arabidopsis, Yeast, Mouse, Fly, etc
• Overview of Pathway Tools software
• Automatic generation of metabolic-flux models
© 2014 SRI International
MetaCyc Family of
Pathway/Genome Databases
• 6,000+ databases from many institutions
• All domains of life with microbial emphasis
• Genomes plus predicted metabolic pathways
• DBs derived from MetaCyc via computational pathway prediction
MetaCyc Family
6,000+

Common schema

Common controlled vocabularies

Managed using Pathway Tools software
Archives of Toxicology 85:1015 2011
© 2014 SRI International
BioCyc.org
3,500
Curated Databases Within the MetaCyc Family
Database
Organism
Organization
Publications
Curated From
MetaCyc
Multiorganism
SRI
40,000
EcoCyc
E. coli
SRI
25,000
HumanCyc
H. sapiens
SRI
AraCyc
A. thaliana
TAIR/Carnegie
Institution
YeastCyc
S. cerevisiae
SGD/Stanford/SRI
MouseCyc
M. musculus
MGD/Jackson
Laboratory
http://biocyc.org/otherpgdbs.shtml
© 2014 SRI International
2,282
565
Pathway Tools Software
• Comprehensive systems biology software environment
• Create and maintain an organism database integrating genome, pathway,
regulatory information
– Computational inference tools
– Interactive editing tools
• Query and visualize that database
• Generate steady-state metabolic flux models
– Flux-balance analysis
• Interpret omics datasets
• Comparative analysis tools
• Licensed by 5,000+ groups
© 2014 SRI International
Motivations: Management of
Metabolic Pathway Data
• Organize growing corpus of data on metabolic pathways
– Experimentally elucidated pathways in the biomedical literature
– Computationally predicted pathways derived from genome data
• Provide software tools for querying and comprehending this complex
information space
• Multiorganism view: MetaCyc
– Unique, experimentally elucidated pathways across all organisms
– Reference database for computational pathway prediction
• Organism-specific view:
– Organism-specific Pathway/Genome Databases
– Detailed qualitative models of metabolic networks
– Combine computational predictions with experimentally determined
pathways
© 2014 SRI International
Model Organism Databases /
Organism Specific Databases
• DBs that describe the genome and other information about an organism
• Every sequenced organism with an active experimental community requires
a MOD
– Integrate genome data with information about the biochemical and
genetic network of the organism
– Integrate literature-based information with computational predictions
• Accurate metabolic modeling requires a curation effort
© 2014 SRI International
Rationale for MODs
• Each “complete” genome is incomplete in several respects:
– 40%-60% of genes have no assigned function
– Roughly 7% of those assigned functions are incorrect
– Many assigned functions are non-specific
• Need continuous updating of annotations with respect to new
experimental data and computational predictions
– Gene positions, sequence, gene functions, regulatory sites, pathways
• MODs are platforms for global analyses of an organism
– Interpret omics data in a pathway context
– In silico prediction of essential genes
– Characterize systems properties of metabolic and genetic networks
© 2014 SRI International
Pathway/Genome Database
Pathways
Reactions
Compounds
Sequence Features
Proteins
RNAs
Regulation
Operons
Promoters
DNA Binding Sites
Regulatory Interactions
Genes
Chromosomes
Plasmids
CELL
© 2014 SRI International
BioCyc Collection of 3,000
Pathway/Genome Databases
•Pathway/Genome Database (PGDB) –
combines information about
– Pathways, reactions, substrates
– Enzymes, transporters
– Genes, replicons
– Transcription factors/sites, promoters,
operons
•Tier 1: Literature-Derived PGDBs
– MetaCyc, HumanCyc, YeastCyc
– EcoCyc -- Escherichia coli K-12
– AraCyc – Arabidopsis thaliana
•Tier 2: Computationally-derived DBs, Some
Curation -- 34 PGDBs
– Bacillus subtilis, Mycobacterium
tuberculosis
•Tier 3: Computationally-derived DBs, No
Curation -- ~3,000 PGDBs
© 2014 SRI International
Obtaining a PGDB for Organism of Interest
• Find existing PGDB in BioCyc
• Find existing PGDB from larger MetaCyc family of PGDBs
– http://biocyc.org/otherpgdbs.shtml
• Download from PGDB registry
– http://biocyc.org/registry.html
• Create your own PGDB
© 2014 SRI International
Pathway Tools Software:
PGDBs Created Outside SRI
•4,000+ licensees: 250 groups applying software to 1,700 organisms
•Saccharomyces cerevisiae, SGD project, Stanford University
– 135 pathways / 565 publications – BioCyc.org
•FungiCyc, Broad Institute
•Candida albicans, CGD project, Stanford University
•dictyBase, Northwestern University
•Mouse, MGD, Jackson Laboratory -- BioCyc.org
•Drosophila, FlyBase, Harvard University -- BioCyc.org
•Under development:
– C. elegans, WormBase
•Arabidopsis thaliana, TAIR, Carnegie Institution of Washington
– 288 pathways / 2282 publications – BioCyc.org
•PlantCyc: Poplar, Cassava, Corn, Grape, Soy, Carnegie Institution
•Six Solanaceae species, Cornell University
•GrameneDB: Rice, Sorghum, Maize, Cold Spring Harbor Laboratory
•Medicago truncatula, Samuel Roberts Noble Foundation
•ChlamyCyc, GoFORSYS
© 2014 SRI International
Pathway Tools Software:
PGDBs Created Outside SRI
•M. Bibb, John Innes Centre, Streptomyces coelicolor
•F. Brinkman, Simon Fraser Univ, Pseudomonas aeruginosa
•Genoscope, Acinetobacter
•R.J.S. Baerends, University of Groningen, Lactococcus lactis IL1403,
Lactococcus lactis MG1363, Streptococcus pneumoniae TIGR4, Bacillus
subtilis 168, Bacillus cereus ATCC14579
•Matthew Berriman, Sanger Centre, Trypanosoma brucei, Leishmania
major
•Sergio Encarnacion, UNAM, Sinorhizobium meliloti
•Mark van der Giezen, University of London, Entamoeba histolytica,
Giardia intestinalis
© 2014 SRI International
Pathway Tools Software:
PGDBs Created Outside SRI
• Large scale users:
–
–
–
–
–
–
C. Medigue, Genoscope, 500+ PGDBs
J. Zucker, Broad Inst, 94 PGDBs
G. Sutton, J. Craig Venter Institute, 80+ PGDBs
G. Burger, U Montreal, 60+ PGDBs
E. Uberbacher, ORNL 33 Bioenergy-related organisms
Bart Weimer, UC Davis, Lactococcus lactis, Brevibacterium linens,
Lactobacillus acidophilus, Lactobacillus plantarum, Lactobacillus
johnsonii, Listeria monocytogenes
• Partial listing of outside PGDBs at http://biocyc.org/otherpgdbs.shtml
© 2014 SRI International
EcoCyc Project – EcoCyc.org
• E. coli Encyclopedia
– Review-level Model-Organism Database for E. coli
– Derived from 25,000 publications
• “Multi-dimensional annotation of the E. coli K-12 genome”
– Gene product summaries and literature citations
– Evidence codes
– Gene Ontology terms
– Protein features (active sites, metal ion binding sites)
– Multimeric complexes
– Metabolic pathways
– Regulation of gene expression and of protein activity
– Gene essentiality data
– Growth under alternative nutrient conditions
Karp, Gunsalus, Collado-Vides, Paulsen
Nuc. Acids Res. 41:D605 2013
© 2014 SRI International
EcoCyc = E.coli Dataset +
Pathway/Genome Navigator
Pathways: 312
EcoCyc v17.0
Reactions:
Metabolic: 1600
Transport: 370
Compounds: 2,400
Citations: 24,000
Monomers: 4389
Complexes: 976
RNAs: 301
Genes: 4,499
Regulation:
Operons: 4,500
Trans Factors: 222
Promoters: 3,770
TF Binding Sites: 2,700
Reg Interactions: 5,900
URL:
EcoCyc.org
© 2014 SRI International
Perspective 1:
EcoCyc as Online Encyclopedia
• All gene products for which experimental literature exists are curated with
a minireview summary
– 3,730 gene products contain summaries
– Summaries cover function, interactions, mutant phenotypes, crystal structures,
regulation, and more
• Additional summaries and other data found in pages for genes, operons,
pathways
• Quick Search
© 2014 SRI International
Perspective 2: EcoCyc as
Queryable Database
• High-fidelity knowledge representation amenable to structured queries
• 333 database fields capture object properties and relationships
• Each molecular species defined as a DB object
– Genes, proteins, small molecules
• Each molecular interaction defined as a DB object
– Metabolic and transport reactions, regulation
• Extensive search tools
– Object-specific search
– Advanced search
Search Menu
Search -> Advanced
© 2014 SRI International
Paradigm 3:
EcoCyc as Predictive Metabolic Model
• A steady-state quantitative model of E. coli metabolism can be generated
from EcoCyc
• Predicts phenotypes of E. coli knock-outs, and growth/no-growth of E. coli
on different nutrients
• Model is updated on each EcoCyc release
• Serves as a quality check on the EcoCyc data
© 2014 SRI International
EcoCyc Accelerates Science
• Experimentalists
– E. coli experimentalists
– Experimentalists working with other microbes
– Analysis of expression data
• Computational biologists
– Biological research using computational methods
– Genome annotation
– Study properties of E. coli metabolic and regulatory networks
• Bioinformaticists
– Training and validation of new bioinformatics algorithms – predict
operons, promoters, protein functional linkages, protein-protein
interactions,
• Metabolic engineers
– “Design of organisms for the production of organic acids, amino
acids, ethanol, hydrogen, and solvents “
• Educators
– Microbiology and metabolism education
© 2014 SRI International
Recent Developments in EcoCyc
• EcoCyc contains six knock-out datasets for E. coli containing 13,000
growth observations
Reference
Medium
No Growth
Growth
Indeterminate
Gerdes03
LB enriched, aerobic
614
3082
2
Baba06
LB Lennox, aerobic
300
3906
1
Baba06
MOPS+0.4% glucose,
aerobic
460
4823
92
Feist07
MOPS+0.4% glucose,
aerobic
460
4823
92
M9+0.4% glucose,
aerobic
107
M9+1% glucose,
aerobic
118
3763
1
Patrick07
Joyce06
© 2014 SRI International
Recent Developments in EcoCyc –
Growth-Observation Data
• EcoCyc contains 1831 growth observations under 522 conditions for E.
coli
• Substantial number of discrepancies
– 45 cases remain where growth status is unclear
Aerobic
Anaerobic
Low throughput data from
literature
23
0
Low throughput data from
our group
20
0
PM data from literature
1244
191
PM data from our group
353
0
© 2014 SRI International
© 2014 SRI International
MetaCyc: Metabolic Encyclopedia
• Describes experimentally determined metabolic pathways, reactions,
enzymes, and compounds
• Literature-based DB with extensive references and commentary
• MetaCyc vs BioCyc: Experimentally elucidated pathways
• Jointly developed by
– P. Karp, R. Caspi, C. Fulcher, SRI International
– L. Mueller, A. Pujar, Boyce Thompson Institute
– S. Rhee, P. Zhang, Carnegie Institution
Nucleic Acids Research 2012 Database Issue
© 2014 SRI International
MetaCyc Data -- Version 18.0
Pathways
2,200
Reactions
11,700
Enzymes
9,700
Small Molecules
11,000
Organisms
2,500
Citations
40,600
“A Systematic Comparison of the MetaCyc and KEGG Pathway Databases
BMC Bioinformatics 2013 14(1):112
© 2014 SRI International
Taxonomic Distribution of MetaCyc Pathways
Version 17.5
Bacteria
1,130
Green Plants
830
Fungi
300
Metazoa
275
Archaea
148
© 2014 SRI International
Comparison with KEGG
• KEGG vs MetaCyc: Reference pathway collections
– KEGG maps are not pathways
Nuc Acids Res 34:3687 2006
• KEGG maps contain multiple biological pathways
• KEGG maps are composites of pathways in many organisms -- do not
identify what specific pathways elucidated in what organisms
• Two genes chosen at random from a BioCyc pathway are more likely to be
related according to genome context methods than from a KEGG pathway
– KEGG has few literature citations, few comments, less enzyme detail
• KEGG vs organism-specific PGDBs
– KEGG does not curate or customize pathway networks for each
organism
– Highly curated PGDBs now exist for important organisms such as E.
coli, yeast, mouse, Arabidopsis
© 2014 SRI International
Pathway Tools
44
SRI International Bioinformatics
Pathway Tools Software
Annotated
Genome
MetaFlux
+
PathoLogic
MetaCyc
Pathway/Genome
Database
Pathway/Genome
Navigator
Pathway/Genome
Editors
© 2014 SRI International 11:40-79 2010
Briefings in Bioinformatics
Pathway Tools Enables Multi-Use Metabolic Databases
Metabolic Model
Encyclopedia
Queryable Database
Zoomable Metabolic Map
Omics Data
Analysis
© 2014 SRI International
Pathway Tools Software: PathoLogic
• Computational creation of new Pathway/Genome Databases
• Transforms genome into Pathway Tools schema and layers inferred
information above the genome
•
•
•
•
Predicts operons
Predicts metabolic network
Predicts which genes code for missing enzymes in metabolic pathways
Infers transport reactions from transporter names
© 2014 SRI International
Pathway Tools Software:
Pathway/Genome Editors
• Interactively update PGDBs with
graphical editors
• Support geographically
distributed teams of curators
with object database system
•
•
•
•
•
•
Gene and protein editor
Reaction editor
Compound editor
Pathway editor
Operon editor
Publication editor
© 2014 SRI International
Pathway Tools Software:
Pathway/Genome Navigator
• Querying and visualization of:
– Pathways
– Reactions
– Metabolites
– Genes/Proteins/RNA
– Regulatory interactions
– Chromosomes
• Modes of operation:
– Web mode
– Desktop mode
– Most functionality shared
© 2014 SRI International
Pathway Tools Software: MetaFlux
•
•
•
•
Speeds development of genome-scale metabolic flux models
Steady-state quantitative flux-models generated directly from PGDBs
Computed reaction fluxes can be painted onto metabolic overview diagram
Multiple gap filler accelerates model development by suggesting model
completions:
– Reactions to add from MetaCyc
– Additional nutrients and secreted compounds
© 2014 SRI International
Pathway Tools Schema / Ontology
• 1064 classes
– Datatype classes such as:
• Pathways, Reactions, Compounds, Macromolecules, Proteins, Replicons,
DNA-Segments (Genes, Operons, Promoters)
– Taxonomies for Pathways, Reactions, Compounds
– Cell Component Ontology
– Evidence Ontology
• 308 attributes and relationships
• Span genome, metabolism, regulatory information
– Meta-data: Creator, Creation-Date
– Comment, Citations, Common-Name, Synonyms
– Attributes: Molecular-Weight, DNA-Footprint-Size
– Relationships: Catalyzes, Component-Of, Product
© 2014 SRI International
Pathway Prediction
• Pathway prediction is useful because
–
–
–
–
Pathways organize the metabolic network into mentally tractable units
Pathways guide us to search for missing enzymes
Pathway inference fills in holes in the metabolic network
Pathways can be used for analysis of high-throughput data
• Visualization, enrichment analysis
• Pathway prediction is hard because
–
–
–
–
Reactome inference is imperfect
Some reactions present in multiple pathways
Pathway variants share many reactions in common
Increasing size of MetaCyc
© 2014 SRI International
Reactome Inference
• For each protein in the organism, infer reaction(s) it catalyzes
• Protein functions can be specified in three ways:
– Enzyme names (protein functions) (uncontrolled vocabulary)
– EC numbers
– Gene Ontology terms
• Detect conflicts among this information
– Example:
• Yersinia pseudotuberculosis PB1
• 2-succinyl-5-enolpyruvyl-6-hydroxy-3-cyclohexene-1-carboxylate synthase / EC
4.1.1.71
© 2014 SRI International
Enzyme Name Matching
• Extraneous information found in gene product names
•
•
•
•
Putative carbamate kinase, alpha subunit
Carbamate kinase (abcD)
Carbamate kinase (3.2.1.4)
Monoamine oxidase B
• bifunctional proline dehydrogenase/pyrroline-5-carboxylate dehydrogenase
© 2014 SRI International
Inference of Metabolic Pathways
• For each pathway in MetaCyc consider
– What fraction of its reactions are present in the just-inferred reactome of the
organism?
– Are enzymes present for reactions unique to the pathway?
– Are enzymes present for designated “key reactions” within MetaCyc pathways?
• Calvin cycle / ribulose bisphosphate carboxylase
– Is a given pathway outside its designated taxonomic range?
• Calvin cycle: green plants, green algae, etc
Standards in Genomic Sciences 5:424-429 2011
© 2014 SRI International
Evaluation of Pathway Inference
• Define gold-standard pathway prediction set
– E. coli, Yeast, Arabidopsis, Synechococcus, Mouse
– Positive and negative pathways
• PathoLogic achieved 91% accuracy
BMC Bioinformatics 11:15 2010
© 2014 SRI International
Comparison with KEGG
• KEGG vs MetaCyc: Reference pathway collections
– KEGG maps are not pathways
Nuc Acids Res 34:3687 2006
• KEGG maps contain multiple biological pathways
• KEGG maps are composites of pathways in many organisms -- do not
identify what specific pathways elucidated in what organisms
– KEGG modules are incomplete
– KEGG has few literature citations, few comments, less enzyme detail
• KEGG vs organism-specific PGDBs
– KEGG does not curate or customize pathway networks for each
organism
– Highly curated PGDBs now exist for important organisms such as E.
coli, yeast, mouse, Arabidopsis
• KEGG algorithms
– Not published; accuracy
unknown
© 2014 SRI International
Pathway Analysis of Metagenomes
• Bin the metagenome data and create separate PGDBs for each organism
– Hallam lab
• Compute list of all pathways present in the metagenome
© 2014 SRI International
Analysis of High Throughput Datasets
Genome-scale
visualizations of cellular networks
•Generated automatically from PGDB
•Magnify, interrogate
•Omics viewers paint omics data onto
overview diagrams
– Different perspectives on same dataset
– Use animation for multiple time points
or conditions
© 2014 SRI International
Cellular Overview Diagram
• Combines metabolic map and transporters
• Automatically generated, organism-specific
• Zoomable, queryable
© 2014 SRI International
© 2014 SRI International
© 2014 SRI International
E. coli Cellular Overview
© 2014 SRI International
© 2014 SRI International
© 2014 SRI International
© 2014 SRI International
© 2014 SRI International
Omics Data Graphing on Cellular Overview
© 2014 SRI International
© 2014 SRI International
© 2014 SRI International
Genome Overview
© 2014 SRI International
Genome Poster
© 2014 SRI International
Genome Overview
© 2014 SRI International
Regulatory Overview
• Show regulatory relationships among gene groups
© 2014 SRI International
Regulatory Omics Viewer
© 2014 SRI International
The Atom-Mapping Problem
• Definition: An atom-mapping is
a bijection from reactant
atoms to product atoms that
specifies the terminus of each
reaction atom
• MetaCyc v17.5 contains 10,300
atom mappings
© 2014 SRI International
Applications of Atom Mapping
• Speed visual comprehension of reactions and pathways
© 2014 SRI International
Applications of Atom Mapping
• Improve evaluation of computer-generated metabolic pathways
– Do any feedstock atoms reach target compound?
– What fraction of feedstock atoms reach target compound?
• Facilitate design and interpretation of isotope-labeling experiments
© 2014 SRI International
Atom Mapping: Our Approach
• Weighted Minimal Bond-Edit Distance
– Edit distance weighted by bond type and atom species
– Computed using MILP for 9,390 MetaCyc reactions
– Average time per reaction:
• 73% are solved in less than 1 second
• 96% are solved in less than 60 seconds
– 96% of reactions had 1 or 2 solutions (with symmetries removed) – different
bonds made/broken
• Solution times are a function of the solver
– SCIP vs CPLEX
J Chem Inf Model. 2012 52:2970-82.
© 2014 SRI International
Accuracy of Our Atom Mappings
• Use KEGG RPAIR as a gold standard
– Caveats: Not clear which RPAIRs are curated; accuracy of RPAIR unclear
• We implemented software to
– Import KEGG and RPAIR into a Pathway Tools PGDB
– Map atoms in KEGG reactions to corresponding atoms in MetaCyc reactions
• 2,446 atom mappings from KEGG RPAIR could be compared to MetaCyc
mappings
–
–
–
–
25 disagreements:
1 reaction: experimental evidence our mapping is correct
2 reactions: similar to preceding
22 reactions – KEGG is correct
© 2014 SRI International
RouteSearch Software ---
Metabolism->Metabolic Route Search
• User specifies feedstock compound and target compound
• Software computes minimal-cost paths from feedstock to target based on
reactions from
– Current PGDB, plus, optionally
– MetaCyc
• Optimality criteria: minimize
– Number of reactions
– Number of lost atoms based on atom mappings
– Number of reactions foreign to the organism
• User interface guides exploration of solution pathways
Latendresse et al., Bioinformatics 2014
© 2014 SRI International
Results: Sample Metabolic Engineering Problems
• Five engineered pathways obtained from literature
–
–
–
–
–
2-oxoisovalerate 3-methylbutanol (3-methylbutanol biosynthesis)
pyruvate  isopropanol (isopropanol biosynthesis)
pyruvate  n-butanol (pyruvate fermentation to butanol II)
3-phospho-D-glycerate  n-butanol (1-butanol autotrophic biosynthesis)
3-dehydroquinate  vanillin (vanillin biosynthesis)
• Given feedstock and target compounds, our system found the literature
pathway in all five cases
© 2014 SRI International
Pyruvate  Isopropanol
• Two highest-ranked pwys shown
• Best corresponds to pwy from literature
• Search engine can continue to generate alternatives
© 2014 SRI International
Metabolomics Applications
• MetaCyc contains extensive multiorganism metabolite database
• Organism-specific metabolite databases in each PGDB
– Genome+pathway context for interpreting metabolomics data
•
•
•
•
Monoisotopic mass searches
Paint metabolomics data onto pathway maps
Group transformations
Enrichment analysis
© 2014 SRI International
Object Groups
• Collect and save lists of
metabolites, genes,
pathways, …
• Share groups with colleagues
© 2014 SRI International
Object Groups
• Create manually, from files, from query results
• Explore gene list interactively
• Combine (union, intersection, subtraction)
© 2014 SRI International
Manipulate Genes and Sequences
© 2014 SRI International
Object Group Transformations
• Transform metabolite group into group of metabolic pathways, then
into gene group
• Create group containing transcriptional regulator; transform to all
genes it regulates
• Transform gene group into group of regulators of those genes
• Transform gene group into list of TF binding sites controlling those
genes; into list of sequences
• Create group of nucleotide positions; transform to closest genes; paint
to cellular overview
© 2014 SRI International
Object Groups: Enrichment Analysis
“My experiments yielded a set of genes/metabolites. What do they have in
common?”
• Given a group of genes:
– What GO terms are statistically over-represented in that set?
– What metabolic pathways are over-represented?
– What transcriptional regulators are over-represented?
• Given a set of metabolites:
– What metabolic pathways are statistically over-represented in that set?
© 2014 SRI International
Automated Generation of Metabolic Flux
Models from PGDBs
Joint work with Mario Latendresse
128
SRI International Bioinformatics
Marriage of Systems Biology and Model-Organism
Databases
• Systems biology
– Qualitative system-level analysis
– Quantitative system-level modeling
• Hypothesis: Strong synergies between MODs and SB
• Curation is critical to SB and to MODs
– Biological models undergo long periods of updating and refinement
– Common curation effort for MOD and systems-biology model
• MOD provides data needed for SB construction and validation
• SB identifies errors and omissions in MOD, directs curation
• Methodologies from MODs can benefit systems-biology models
– Evidence codes
– Mini-review summaries
– Literature citations
© 2014 SRI International
Flux-Balance Analysis
• Steady state, constraint-based quantitative models of metabolism
• Starting information for organism of interest:
Nutrients
A
Secretions
Metabolic Reaction List
A
B
C
X
© 2014 SRI International
D
Biomass
D
Flux Balance Analysis
• Define system of linear equations encoding fluxes on each
metabolite M
R1
– R1 + R2 = R3 + R4 + R5
• Boundary reactions:
– Exchange fluxes for nutrients and secretions
R2
– Biomass reaction
L-arginine … + GTP … + …  biomass
• Submit to linear optimization package
– Optimize biomass production
– Optimize ATP production
– Optimize production of desired end product
© 2014 SRI International
R3
M
R4
R5
Example
Biomass: ATP:alanine 4:1
40
100
glucose
100
glucose
2 pyruvate
alanine
160
ATP
160
O2
O2
© 2014 SRI International
Pathway Prediction
• Pathway prediction is useful because
–
–
–
–
Pathways organize the metabolic network into mentally tractable units
Pathways guide us to search for missing enzymes
Pathway inference fills in holes in the metabolic network
Pathways can be used for analysis of high-throughput data
• Visualization, enrichment analysis
• Pathway prediction is hard because
–
–
–
–
Reactome inference is imperfect
Some reactions present in multiple pathways
Pathway variants share many reactions in common
Increasing size of MetaCyc
© 2014 SRI International
FBA Results
• FBA predicts steady-state reaction fluxes for the metabolic network
• Remove reactions from model to predict knock-out phenotypes
• Supply alternative nutrient sets to predict growth phenotypes
• Predict growth rates, nutrient uptake rates
© 2014 SRI International
Approach: Generate FBA Models from
Pathway/Genome Databases
• Store and update metabolic model within PGDB
– All query and visualization tools applicable to FBA model
– FBA model is tightly coupled to genome and regulatory information
• MetaFlux generates linear programming problem from PGDB reactions
• Submit to constraint solver for model execution/solving
• Tools to accelerate model refinement:
– Reaction balance checking
– Dead-end metabolite analysis
– Visualize reaction flux using cellular overview
– Multiple gap filling
MetaFlux: Latendresse et al, Bioinformatics 2012 28:388-96
© 2014 SRI International
MetaFlux FBA Model Execution
• MetaFlux creates .lp file and executes SCIP solver
– Konrad-Zuse-Zentrum für Informationstechnik Berlin
• Interpret SCIP output
– Determine if SCIP found a solution
– Map fluxes to PGDB reactions
• Display resulting fluxes on the Cellular Overview
© 2014 SRI International
Model Debugging Via
Dead End Metabolite Finder
• A small molecule C is a dead-end if:
– C is produced only by metabolic reactions in Compartment, and no
transporter acts on C in Compartment OR
– C is consumed only by metabolic reactions in Compartment, and no
transporter acts on C in Compartment
© 2014 SRI International
Dead-End Metabolite Analysis of EcoCyc
 148 dead-end
metabolites total
 16 dead-end metabolites in EcoCyc pathways:
• (2R,4S)-2-methyl-2,3,3,4tetrahydroxytetrahydrofuran
• 3-hydroxypropionate
• 4-methyl-5-(betahydroxyethyl)thiazole
• 5,6-dimethylbenzimidazole
• aminoacetaldehyde
• cis-vaccenate
• cobinamide
• ethanolamine
© 2014 SRI International
• methanol
• oxamate
• S-adenosyl-4-methylthio-2oxobutanoate
• S-methyl-5-thio-D-ribose
• S2• tetrahydromonapterin
• urate
• urea
Model Debugging via Multiple Gap Filling
• Most FBA models are not initially solvable because of incomplete or
incorrect information
• MetaFlux uses meta-optimization to postulate alterations to a model to
render it solvable
• Each alteration has an associated cost; minimize cost of alterations
• Formulate as MILP and submit to SCIP
© 2014 SRI International
Multiple Gap Filling of FBA Models
• Reaction gap filling (Kumar et al, BMC Bioinf 2007 8:212):
– Reverse directionality of selected reactions
– Add a minimal number of reactions from MetaCyc to the model to enable a
solution
– Reaction cost is a function of reaction taxonomic range
• Metabolite gap filling: Postulate additional nutrients and secretions
• Partial solutions: Identify maximal subset of biomass components for which
model can yield positive production rates
© 2014 SRI International
MILP Objective Function for Gap Filling
ΣwbBi + ΣwrRa + ΣwtRb + ΣwmRc +
ΣwsSk + ΣwnNp
i
a
c
b
k
p
Where
• Wb > 0, wr, wt, wm, ws, wn < 0 are weights for
biomass, reactions (2), secretions, and nutrients
• Bi, Ra, Rb, Rc, Sk, Np are binary variables
© 2014 SRI International
Results – FBA Model of Human Metabolism
•
•
•
•
46
biomass compounds
13
nutrients
2 secretions
207 reactions carry non-zero flux
© 2014 SRI International
MetaFlux Gap Filler Suggestions
• Addition of 8 new reactions from MetaCyc; 4 supported by literature
research
• Reversal of 4 reactions confirmed by literature searches
• Enzyme curated into wrong compartment
• FBA analysis identified an amino-acid biosynthetic pathway that should
not have been present in HumanCyc
• Further issues identified by dead-end metabolite analysis and
reachability analysis
© 2014 SRI International
© 2014 SRI International
© 2014 SRI International
Other Capabilities
•
•
•
•
•
Display and editing of protein features
Blast sequences against PGDBs
Retrieve nucleotide and amino acid sequences
Define Web links from PGDB objects to other web sites
Active community of contributors
– JavaCyc, PerlCyc
– SBML and BioPAX export tools
© 2014 SRI International
Pathway Tools Implementation Details
• Platforms:
– Macintosh, PC/Linux, and PC/Windows platforms
• Same binary can run as desktop app or Web server
• PGDBs can be stored in files, MySQL, Oracle
• Production-quality software
– Two regular releases per year
– Extensive quality assurance
– Extensive documentation
– Auto-patch
– Automatic DB-upgrade
© 2014 SRI International
Accesing PGDB Data
•
•
•
•
Export to Genbank, SBML, BioPAX
Export to tab-delimited files
Export to attribute-value files
Attribute-value files can be imported into SRI’s BioWarehouse
– Relational database system for bioinformatics database integration
• APIs
–
–
–
–
Web services -- http://biocyc.org/web-services.shtml
Lisp
PerlCyc
JavaCyc
© 2014 SRI International
Summary
• Pathway/Genome Databases
– MetaCyc non-redundant DB of literature-derived pathways
– MetaCyc family of ~4,000 PGDBs
• Pathway Tools software
–
–
–
–
–
Extract pathways from genomes
Distributed curation tools for PGDB development
Query, visualization, WWW publishing
Omics data analysis
Quantitative metabolic models
© 2014 SRI International
BioCyc and Pathway Tools
Availability
• BioCyc.org Web site and database files freely available to all
• Pathway Tools freely available to non-profits
– Macintosh, PC/Windows, PC/Linux
© 2014 SRI International
Acknowledgements
•SRI
– Suzanne Paley, Ron Caspi, Mario
Latendresse, Ingrid Keseler,
Carol Fulcher, Tim Holland,
Markus Krummenacker, Tomer
Altman, Richard Billington,
Pallavi Kaipa, Deepika Brito
•Funding sources:
– NIH National Institute of
General Medical Sciences
–Department of Energy
•EcoCyc Collaborators
– Julio Collado-Vides, Robert
Gunsalus, Ian Paulsen
•MetaCyc Collaborators
http://www.ai.sri.com/pkarp/talks/
– Sue Rhee, Peifen Zhang, Kate
Dreher
BioCyc webinars:
– Lukas Mueller, Hartmut Foerster
biocyc.org/webinar.shtml
© 2014 SRI International
Learn More
• Pathway Tools Tutorial
– April 25-27
• http://bioinformatics.ai.sri.com/ptools/tutorial/
© 2014 SRI International
Download