2010-01-UGhent-Intro - SRI's Artificial Intelligence Center

advertisement
Pathway Tools / BioCyc
Fundamentals
Peter D. Karp, Ph.D.
Bioinformatics Research Group
SRI International
pkarp@ai.sri.com
BioCyc.org
EcoCyc.org, MetaCyc.org, HumanCyc.org
1
SRI International Bioinformatics
Pathway Tools Capabilities
 Create
and maintain an organism database
integrating genome, pathway, regulatory
information
 Computational inference tools
 Interactive editing tools
 Query and visualize that database
 Use the database to interpret omics data
 Metabolic network analysis tools
 Comparative analysis tools
 Export the metabolic network to SBML
 Speed creation of flux-balance models by order of magnitude
2
SRI International Bioinformatics
BioCyc
 Hundreds
of microbial genomes
 Inferred operons and metabolic networks
 Couples
curated data with computational
predictions
 Supports analysis of omics data
 Comparative analysis tools
 Microbial
emphasis. Exceptions:
 HumanCyc, MouseCyc, CattleCyc
3
SRI International Bioinformatics
Model Organism Databases /
Organism Specific Databases
4

DBs that describe the genome and other information about
an organism

Every sequenced organism with an active experimental
community requires a MOD
 Integrate genome data with information about the biochemical and genetic
network of the organism
 Integrate literature-based information with computational predictions

Curated by experts for that organism
 No one group can curate all the world’s genomes
 Distribute workload across a community of experts to create a community
resource
SRI International Bioinformatics
Rationale for MODs
5

Each “complete” genome is incomplete in several respects:
 40%-60% of genes have no assigned function
 Roughly 7% of those assigned functions are incorrect
 Many assigned functions are non-specific

Need continuous updating of annotations with respect to
new experimental data and computational predictions

MODs are platforms for global analyses of an organism
 Interpret omics data in a pathway context
 In silico prediction of essential genes
 Characterize systems properties of metabolic and genetic networks
SRI International Bioinformatics
What is Curation?









6
Ongoing updating and refinement of a PGDB
Correcting false-positive and false-negative
predictions
Incorporating information from experimental literature
Authoring of comments and citations
Updating database fields
Gene positions, names, synonyms
Protein functions, activators, inhibitors
Addition of new pathways, modification of existing
pathways
Defining TF binding sites, promoters, regulation of
transcription initiation and other processes
SRI International Bioinformatics
Pathway/Genome Database
Pathways
Reactions
Proteins
RNAs
Genes
Compounds
Sequence Features
Regulation
Operons
Promoters
DNA Binding Sites
Regulatory Interactions
Chromosomes
Plasmids
CELL
7
SRI International Bioinformatics
BioCyc Collection of 507
Pathway/Genome Databases
Database (PGDB) –
combines information about
 Pathways, reactions, substrates
 Enzymes, transporters
 Genes, replicons
 Transcription factors/sites, promoters,
operons
Pathway/Genome
Tier
1: Literature-Derived PGDBs
 MetaCyc
 EcoCyc -- Escherichia coli K-12
Tier
2: Computationally-derived DBs,
Some Curation -- 24 PGDBs
 HumanCyc
 Mycobacterium tuberculosis
Tier
3: Computationally-derived DBs,
No Curation -- 481 DBs
8
SRI International Bioinformatics
Pathway Tools Overview
Annotated
Genome
MetaCyc
Reference
Pathway DB
PathoLogic
Pathway/Genome
Database
Pathway/Genome
Editors
9
Pathway/Genome
Navigator
SRI International Bioinformatics
Pathway Tools Software: PathoLogic
 Computational
creation of new Pathway/Genome
Databases
 Transforms
genome into Pathway Tools schema
and layers inferred information above the genome
 Predicts
operons
 Predicts metabolic network
 Predicts which genes code for missing enzymes
in metabolic pathways
 Infers transport reactions from transporter names
Bioinformatics 18:S225 2002
10
SRI International Bioinformatics
Pathway Tools Software:
Pathway/Genome Editors

Interactively update PGDBs
with graphical editors

Support geographically
distributed teams of
curators with object
database system

Gene editor
Protein editor
Reaction editor
Compound editor
Pathway editor
Operon editor
Publication editor






11
SRI International Bioinformatics
Pathway Tools Software:
Pathway/Genome Navigator

Querying and visualization of:
 Pathways
 Reactions
 Metabolites
 Proteins
 Genes
 Chromosomes

Two modes of operation:
 Web mode
 Desktop mode
 Most functionality shared, but each
has unique functionality
12
SRI International Bioinformatics
Pathway Tools Software:
PGDBs Created Outside SRI
1,700+
licensees: 75+ groups applying software to 300+ organisms
Saccharomyces
cerevisiae, SGD project, Stanford University
 135 pathways / 565 publications
Candida albicans, CGD project, Stanford University
dictyBase, Northwestern University
Mouse,
MGD, Jackson Laboratory
Under development:
 Drosophila, FlyBase
 C. elegans, WormBase
Arabidopsis
thaliana, TAIR, Carnegie Institution of Washington
 288 pathways / 2282 publications
PlantCyc, Carnegie Institution of Washington
Six Solanaceae species, Cornell University
GrameneDB, Cold Spring Harbor Laboratory
Medicago truncatula, Samuel Roberts Noble Foundation
13
SRI International Bioinformatics
Pathway Tools Software:
PGDBs Created Outside SRI
NIAID
BRCs for Biodefense pathogens:
 BioHealthBase -- Mycobacterium tuberculosis, Francisella tuleremia
 Pathema -- 80+ PGDBs
 PATRIC – Brucella suis, Coxiella burnetii, Rickettsia typhi
 EuPathDB – Cryptosporidium, Plasmodium
G. Xie, Los Alamos Lab, Dental pathogens
F. Brinkman, Simon Fraser Univ, Pseudomonas aeruginosa
V. Schachter, Genoscope, Acinetobacter
M. Bibb, John Innes Centre, Streptomyces coelicolor
G. Church, Harvard, Prochlorococcus marinus, multiple strains
E. Uberbacher, ORNL and G. Serres, MBL, Shewanella onedensis
R.J.S. Baerends, University of Groningen, Lactococcus lactis IL1403, Lactococcus
lactis MG1363, Streptococcus pneumoniae TIGR4, Bacillus subtilis 168, Bacillus cereus
ATCC14579
Matthew Berriman, Sanger Centre, Trypanosoma brucei, Leishmania major
Sergio Encarnacion, UNAM, Sinorhizobium meliloti
Mark van der Giezen, University of London, Entamoeba histolytica, Giardia intestinalis
Michael Gottfert, Technische Universitat Dresden, Bradyrhizobium japonicum
Artiva Maria Goudel, Universidade Federal de Santa Catarina, Brazil, Chromobacterium
violaceum ATCC 12472
14
SRI International Bioinformatics
Pathway Tools Software:
PGDBs Created Outside SRI

Large scale users:
 C. Medigue, Genoscope, 200+ PGDBs
 G. Sutton, J. Craig Venter Institute, 80+ PGDBs
 G. Burger, U Montreal, 60+ PGDBs
 Bart Weimer, Utah State University, Lactococcus lactis, Brevibacterium linens,
Lactobacillus acidophilus, Lactobacillus plantarum, Lactobacillus johnsonii, Listeria
monocytogenes
 Partial
15
listing of outside PGDBs at BioCyc.org
SRI International Bioinformatics
Obtaining a PGDB for Organism of
Interest
 Find
existing curated PGDB
 Find
existing PGDB in BioCyc
 Create
16
your own
SRI International Bioinformatics
EcoCyc Project – EcoCyc.org

E. coli Encyclopedia
 Review-level Model-Organism Database for E. coli
 Tracks evolving annotation of the E. coli genome and cellular networks
 The two paradigms of EcoCyc

“Multi-dimensional annotation of the E. coli K-12 genome”
 Positions of genes; functions of gene products – 76% / 66% exp
 Gene Ontology terms; MultiFun terms
 Gene product summaries and literature citations
 Evidence codes
 Multimeric complexes
 Metabolic pathways
 Cellular regulation
Karp, Gunsalus, Collado-Vides, Paulsen
Nuc. Acids Res. 35:7577 2007
17
ASM News 70:25 2004
Science 293:2040
SRI International Bioinformatics
URL: EcoCyc.org
EcoCyc = E.coli Dataset +
Pathway/Genome Navigator
Pathways: 246
EcoCyc v13.6
Reactions:
Metabolic: 1394
Transport: 246
Compounds: 1,830
Citations: 19,000
Proteins: 4,479
Complexes: 895
RNAs: 285
Genes: 4,492
18
Gene Regulation:
Operons: 3,369
Trans Factors: 196
Promoters: 1,796
TF Binding Sites: 2,205
SRI International Bioinformatics
Paradigm 1:
EcoCyc as Textual Review Article
 All
gene products for which experimental literature
exists are curated with a minireview summary
 Found on protein and RNA pages, not gene pages!
 3257 gene products contain summaries
 Summaries cover function, interactions, mutant
phenotypes, crystal structures, regulation, and more
 Additional
summaries found in pages for operons,
pathways
 EcoCyc
19
cites 17,300 publications
SRI International Bioinformatics
Paradigm 2: EcoCyc as
Computational Symbolic Theory
 Highly
structured, high-fidelity knowledge
representation provides computable information
 Each molecular species defined as a DB object
 Genes, proteins, small molecules
 Each molecular interaction defined as a DB object
 Metabolic reactions
 Transport reactions
 Transcriptional regulation of gene expression
 220 database fields capture extensive properties
and relationships
20
SRI International Bioinformatics
EcoCyc Procedures

DB updates performed by 5 staff curators
 Information gathered from biomedical literature




21
Enter data into structured database fields
Author extensive summaries
Update evidence codes
Corrections submitted by E. coli researchers

Four releases per year

Quality assurance of data and software
 Evaluate database consistency constraints
 Perform element balancing of reactions
 Run other checking programs
SRI International Bioinformatics
EcoCyc Accelerates Science





22
Experimentalists
 E. coli experimentalists
 Experimentalists working with other microbes
 Analysis of expression data
Computational biologists
 Biological research using computational methods
 Genome annotation
 Study connectivity of E. coli metabolic network
 Study phylogentic extent of metabolic pathways and enzymes in all
domains of life
Bioinformaticists
 Training and validation of new bioinformatics algorithms – predict
operons, promoters, protein functional linkages, protein-protein
interactions,
Metabolic engineers
 “Design of organisms for the production of organic acids, amino acids,
ethanol, hydrogen, and solvents “
Educators
SRI International Bioinformatics
MetaCyc: Metabolic Encyclopedia





Describe a representative sample of every experimentally
determined metabolic pathway
Describe properties of metabolic enzymes
Literature-based DB with extensive references and
commentary
Pathways, reactions, enzymes, substrates
Jointly developed by
 P. Karp, R. Caspi, C. Fulcher, SRI International
 L. Mueller, A. Pujar, Boyce Thompson Institute
 S. Rhee, P. Zhang, Carnegie Institution
Nucleic Acids Research 2008
23
SRI International Bioinformatics
Applications of MetaCyc
 Reference
source on metabolic pathways
 Metabolic
engineering
 Find enzymes with desired activities, regulatory properties
 Determine cofactor requirements
 Predict
pathways from genomes
 Systematic
studies of metabolism
 Computer-aided
24
education
SRI International Bioinformatics
MetaCyc Data -- Version 13.6
25
Pathways
1,436
Reactions
8,200
Enzymes
6,060
Small Molecules
8,400
Organisms
1,800
Citations
21,700
SRI International Bioinformatics
Taxonomic Distribution of
MetaCyc Pathways – version 13.1
26
Bacteria
883
Green Plants
607
Fungi
199
Mammals
159
Archaea
112
SRI International Bioinformatics
Enzyme Data Available in MetaCyc
 Reaction(s)
catalyzed
 Alternative substrates
 Activators, inhibitors, cofactors, prosthetic groups
 Subunit structure
 Genes
 Features on protein sequence
 Cellular location
 pI, molecular weight, Km, Vmax
 Gene Ontology terms
 Links to other bioinformatics databases
30
SRI International Bioinformatics
What is a Pathway?
A
connected sequence of biochemical reactions
 Occurs in one organism
 Conserved through evolution
 Regulated as a unit
 Often starts or stops at one of 13 common
intermediate metabolites
31
SRI International Bioinformatics
MetaCyc Pathway Variants
 Pathways
that accomplish similar biochemical
functions using different biochemical routes
 Alanine biosynthesis I – E. coli
 Alanine biosynthesis II – H. sapiens
 Pathways
that accomplish similar biochemical
functions using similar sets of reactions
 Several variants of TCA Cycle
32
SRI International Bioinformatics
MetaCyc Super-Pathways





33
Groups of pathways linked by common substrates
Example: Super-pathway containing
 Chorismate biosynthesis
 Tryptophan biosynthesis
 Phenylalanine biosynthesis
 Tyrosine biosynthesis
Super-pathways defined by listing their component
pathways
Multiple levels of super-pathways can be defined
Pathway layout algorithms accommodate super-pathways
SRI International Bioinformatics
Comparison with KEGG

KEGG vs MetaCyc: Reference pathway collections
 KEGG maps are not pathways
Nuc Acids Res 34:3687 2006






35
KEGG maps contain multiple biological pathways
Two genes chosen at random from a BioCyc pathway are more likely to be
related according to genome context methods than from a KEGG pathway
KEGG maps are composites of pathways in many organisms -- do not identify
what specific pathways elucidated in what organisms
KEGG has no literature citations, no comments, less enzyme detail
KEGG assigns half as many reactions to pathways as MetaCyc
KEGG vs organism-specific PGDBs
 KEGG does not curate or customize pathway networks for each organism
 Highly curated PGDBs now exist for important organisms such as E. coli,
yeast, mouse, Arabidopsis
SRI International Bioinformatics
Comparison of Pathway Tools to KEGG
 Inference
tools
 KEGG does not predict presence or absence of pathways
 KEGG lacks pathway hole filler, operon predictor
 Curation tools
 KEGG does not distribute curation tools
 No ability to customize pathways to the organism
 Pathway Tools schema much more comprehensive
 Visualization and analysis
 KEGG does not perform automatic pathway layout
 KEGG metabolic-map diagram extremely limited
 No comparative pathway analysis
36
SRI International Bioinformatics
Pathway Tools Implementation Details
37

Platforms:
 Macintosh, PC/Linux, and PC/Windows platforms

Same binary can run as desktop app or Web server

Production-quality software
 Version control
 Two regular releases per year
 Extensive quality assurance
 Extensive documentation
 Auto-patch
 Automatic DB-upgrade

480,000 lines of Lisp code
SRI International Bioinformatics
 ptools-support@ai.sri.com
38
SRI International Bioinformatics
Pathway Tools Architecture
Web
Mode
Lisp
Perl
Java
Disk
File
39
Pathway
Genome
Navigator
GFP API
Desktop
Mode
Protein Editor
Pathway Editor
Reaction Editor
Ocelot DBMS
SRI International Bioinformatics
Oracle
or
MySQL
Ocelot Knowledge Server
Architecture

Frame data model
 Minimizes size of schema relative to semantic complexity

Schema is stored within the DB
Schema is self documenting
Slot units define metadata about slots
 Domain, range, inverse
 Collection type, number of values, value constraints
 Comment



40
Schema evolution facilitated by
 Easy addition/removal of slots, or alteration of slot datatypes
 Flexible data formats that do not require dumping/reloading of data
SRI International Bioinformatics
Ocelot Storage System Architecture

Persistent storage via disk files or Oracle or MySQL
 Concurrent development: Oracle or MySQL
 Single-user development: disk files

Oracle/MySQL DBMS storage
 DBMS is submerged within Ocelot, invisible to users
 Frames transferred from DBMS to Ocelot





41
On demand
By background prefetcher
Memory cache
Persistent disk cache to speed performance via Internet
Transaction logging facility
SRI International Bioinformatics
Why Do We Code in Common Lisp?
 Gatt
studied Lisp and Java implementation of 16
programs by 14 programmers (Intelligence 11:21
2000)
 The average Lisp program ran 33 times faster than the
average Java program
 The average Lisp program was written 5 times faster than the
average Java program
 Roberts compared Java and Lisp implementations
of a Domain Name Server (DNS) resolver
42

http://www.findinglisp.com/papers/case_study_java_lisp_dns.html

The Lisp version had ½ as many lines as code
SRI International Bioinformatics
Common Lisp Programming
Environment
 Interpreted
and/or compiled execution
 Fabulous debugging environment
 High-level language
 Interactive data exploration
 Extensive built-in libraries
 Dynamic redefinition
 Find
out more!
 See ALU.org or
 http://www.international-lisp-conference.org/
43
SRI International Bioinformatics
PathoLogic Processing
1.
2.
3.
4.
5.
6.
44
Translate source genome to PGDB form
Predict operons
Predict metabolic pathways
Predict pathway hole fillers
Transport inference parser
Build metabolic overview diagram
SRI International Bioinformatics
PathoLogic Step 1: Translate Genome to PGDB
Annotated Genomic
Sequence
Pathway/Genome
Database
Gene Products
Pathways
Genes/ORFs
DNA Sequences
Multi-organism Pathway
Database (MetaCyc)
Pathways
Reactions
PathoLogic
Software
Integrates genome and
pathway data to identify
putative metabolic
networks
Compounds
Gene Products
Genes
Reactions
Genomic Map
Compounds
45
SRI International Bioinformatics
PathoLogic Step 2:
Predict Operons

Predict adjacent genes A and B in same operon based on:
 Intragenic distance
 Functional relatedness of A and B

Tests for functional relatedness:
 A and B in same gene functional class (MultiFun)
 A and B in same metabolic pathway
 A codes for enzyme in a pathway and B codes for transporter involving a
substrate in that pathway
 A and B are monomers in same protein complex

Correctly predicts 80% of E. coli transcription units
Marks predicted operons with computational evidence codes

Bioinformatics 20:709-17 2004
47
SRI International Bioinformatics
PathoLogic Step 3: Prediction of Metabolic
Pathways
 Infer
reaction complement of organism
 Match enzymes in source genome to MetaCyc reactions they
catalyze
 Match enzyme names and EC numbers to MetaCyc
 Support user in manually matching additional enzymes
 Computationally
predict which MetaCyc metabolic
pathways are present
 For each MetaCyc pathway, evaluate which of its reactions
are catalyzed by the organism
48
SRI International Bioinformatics
Match Enzymes to Reactions
5.1.3.2
Gene
product
MetaCyc
UDP-glucose-4epimerase
2057 proteins matched by EC#
314 matched by name
Match
no
yes
Probable enzyme 1320
Assign
-ase
UDP-D-glucose  UDP-galactose
no
yes
Not a metabolic
enzyme
Manually
search
no
Can’t Assign
49
yes
Assign
SRI International Bioinformatics
625
Import Pathways
reactions
Containing MetaCyc
pathways
Import All
Prune?
yes
50
keep
no
Manual
Review
yes
Delete
no
delete
SRI International Bioinformatics
Pathway Prediction
 Prediction
is hard because
 Enzyme naming is irregular
 Some reactions present in multiple pathways
 Pathway variants share many reactions in common
 MetaCyc now has many pathways
51
SRI International Bioinformatics
Pathway Scoring Criteria
 Imported
pathways must satisfy:
 Pathways outside their taxonomic range must have enzymes
for all reactions
 If any reactions in a pathway are designated as “key,” an
enzyme must be present for at least one
 Pathway
P is imported if any conditions satisfied:
 One unique enzyme present for P
 P missing at most one reaction
 More reactions present than absent for P
 P is not a superset of another pathway with the same number
of enzymes present
52
SRI International Bioinformatics
Pathway Evidence Report
53
SRI International Bioinformatics
PathoLogic Step 4: Pathway Hole Filler
Definition:
Pathway Holes are reactions in metabolic
pathways for which no enzyme is identified
L-aspartate
1.4.3.-
iminoaspartate
quinolinate synthetase
nadA
quinolinate
holes
NAD+ synthetase, NH3 dependent
CC3619
deamido-NAD
n.n. pyrophosphorylase
nadC
2.7.7.18
NAD
54
nicotinate
nucleotide
6.3.5.1
SRI International Bioinformatics
Step 2: BLAST
against target
genome
gene X
Step 1: Query UniProt
for all sequences having
EC# of pathway hole
Step 3 & 4: Consolidate
hits and evaluate
evidence
organism 1 enzyme A
organism 2 enzyme A
organism 3 enzyme A
organism 4 enzyme A
gene Y
organism 5 enzyme A
7 queries have high-scoring
hits to sequence Y
organism 6 enzyme A
organism 7 enzyme A
organism 8 enzyme A
gene Z
55
SRI International Bioinformatics
Pathway Hole Filler
 Why
should hole filler find things beyond the
original genome annotation?
 Reverse
BLAST searches more sensitive
 Reverse BLAST searches find second domains
 Integration of multiple evidence types
57
SRI International Bioinformatics
Caulobacter crescentus Pathway Holes


130 pathways containing 582 reactions
92 pathways contain 236 pathway holes
Caulobacter holes filled:
 77 holes filled at P >0.9

Previous functions of candidate hole fillers:
 No predicted function
 Correctly assigned single function
 Incorrectly assigned function
 Imprecise functional assignment
BMC Bioinformatics 5:76 2004
58
SRI International Bioinformatics
Example Pathway
CC2913, P=0.99
L-aspartate
1.4.3.-
iminoaspartate
quinolinate synthetase
nadA (CC2912)
quinolinate
holes
NAD+ synthetase, NH3 dependent
CC3619
deamido-NAD
n.n. pyrophosphorylase
nadC (CC2915)
2.7.7.18
6.3.5.1
NAD
nicotinate
nucleotide
CC3431*, P=0.90
CC3619, P=0.99
CC2913 L-aspartate oxidase (wrong EC# on rxn)
CC3431 ORF
CC3619 put. NAD(+)-synthetase (multidomain)
59
SRI International Bioinformatics
PathoLogic Step 5:
Transport Inference Parser
Problem:
Write a program to query a genome annotation to
compute the substrates an organism can transport
Typical





60
genome annotations for transporters:
ATP transporter for ribose
ribose ABC transporter
D-ribose ATP transporter
ABC transporter, membrane spanning protein [ribose]
ABC transporter, membrane spanning protein [D-ribose]
SRI International Bioinformatics
Transport Inference Parser
Input:
“ATP transporter of phosphonate”
Output: Structured description of transport activity
Locates
most transporters in genome annotation using
keyword analysis
Parse



product name using a series of rules to identify:
Transported substrate, co-substrate
Influx/efflux
Energy coupling mechanism
Creates
transport reaction object:
phosphonate[periplasm] + H2O + ATP = phosphonate + Pi + ADP
61
SRI International Bioinformatics
Transport Inference Parser
Permits




62
symbolic computation with transport activities:
Compute transportable substrates of the cell
Compute connectivity among compartments for substrates
Facilitate reasoning about transport/metabolism connections
Draw transport cartoon in protein pages, cellular overview
SRI International Bioinformatics
Transport Inference Parser
User
reviews all assignments using interactive tool that
allows assignments to be revised
User also reviews transporters for which no assignment was
made
63
SRI International Bioinformatics
Regulation
64
SRI International Bioinformatics
Encoding Cellular Regulation in
Pathway Tools -- Goals
 Facilitate
curation of wide range of regulatory
information within a formal ontology
 Compute with regulatory mechanisms and
pathways
 Summary statistics, complex queries
 Pattern discovery
 Visualization of network components
 Provide
training sets for inference of regulatory
networks
 Interpret gene-expression datasets in the context
of known regulatory mechanisms
65
SRI International Bioinformatics
Regulatory Interactions Supported by
Pathway Tools
 Substrate-level
regulation of enzyme activity
 Binding to proteins or small molecules
(phosphorylation)
 Regulation of transcription initiation
 Attenuation of transcription
 Regulation of translation by proteins and by small
RNAs
66
SRI International Bioinformatics
Regulation in Pathway Tools
 Editing
tools
 Transcription
factor display window
 Transcription
unit display window
 Regulatory
67
Overview / Omics Viewer
SRI International Bioinformatics
Regulatory Interaction Editor
68
SRI International Bioinformatics
Regulatory Overview and Omics Viewer
 Show
regulatory relationships among gene
groups
69
SRI International Bioinformatics
Comparative Analysis

Via Cellular Overview

Comparative genome browser

Comparative pathway table

Comparative analysis reports
 Compare reaction complements
 Compare pathway complements
 Compare transporter complements
71
SRI International Bioinformatics
Information Sources
73

Pathway Tools User’s Guide
 aic-export/pathway-tools/ptools/13.0/doc/manuals/userguide.pdf
 NOTE: Location of the aic-export directory can vary across different
computers

Pathway Tools Web Site
 http://bioinformatics.ai.sri.com/ptools/
 Publications, FAQ, programming examples, etc.

Slides from this tutorial
 http://www.ai.sri.com/pkarp/talks/

BioCyc Webinars
 http://biocyc.org/webinar.shtml
SRI International Bioinformatics
BioCyc and Pathway Tools
Availability
 BioCyc.org
Web site and database files freely
available to all
 Pathway
Tools freely available to non-profits
 Macintosh, PC/Windows, PC/Linux
74
SRI International Bioinformatics
Symbolic Systems Biology
Definition:
Global analyses of biological systems using
symbolic computing
75
SRI International Bioinformatics
Symbolic Systems Biology
76

“Symbolic computing is concerned with the representation
and manipulation of information in symbolic form. It is often
contrasted with numeric representation.” -- R. Cameron

Examples of symbolic computation:
 Symbolic algebra programs, e.g., Mathematica, Graphing Calculator
 Compilers and interpreters for programming languages
 Database query languages
 Text analysis programs, e.g., Google
 String matching for DNA and protein sequences
 Artificial Intelligence methods, e.g., expert systems, symbolic logic,
machine learning, natural language understanding
SRI International Bioinformatics
Symbolic Systems Biology
 Concerned
with different questions than
quantitative systems biology
 Symbolic analyses can in many cases produce
answers when quantitative approaches fail
because of lack of parameters or intractable
mathematics
 Symbolic
computation is intimately dependent on
the use of structured ontologies
77
SRI International Bioinformatics
Pathway Tools Ontology

1064 classes
 Main classes such as:


78
Pathways, Reactions, Compounds, Macromolecules, Proteins, Replicons,
DNA-Segments (Genes, Operons, Promoters)
Taxonomies for Pathways, Reactions, Compounds

205 slots
 Meta-data: Creator, Creation-Date
 Comment, Citations, Common-Name, Synonyms
 Attributes: Molecular-Weight, DNA-Footprint-Size
 Relationships: Catalyzes, Component-Of, Product

Classes, instances, slots all stored side by side in DBMS
SRI International Bioinformatics
Critiquing the Parts List
79
Slide thanks to Hirotada Mori
(minus the banana!)
SRI International Bioinformatics
Dead End Metabolites
A
80
small molecule C is a dead-end if:
 C is produced only by SMM reactions in Compartment, and
no transporter acts on C in Compartment OR
 C is consumed only by SMM reactions in Compartment, and
no transporter acts on C in Compartment
SRI International Bioinformatics
Dead End Metabolites
 Not
yet an official part of Pathway Tools
 Contact us if you’d like to use it
81
SRI International Bioinformatics
Reachability Analysis of Metabolic
Networks



Given:
 A PGDB for an organism
 A set of initial metabolites
Infer:
 What set of products can be synthesized by the small-molecule
metabolism of the organism
Motivations:
 Quality control for PGDBs

Verify that a known growth medium yields known essential compounds
Experiment with other growth media
 Experiment with reaction knock-outs
Limitations
 Cannot properly handle compounds required for their own synthesis
 Nutrients needed for reachability may be a superset of those required for
growth


Romero and Karp, Pacific Symposium on Biocomputing, 2001
82
SRI International Bioinformatics
Algorithm: Forward Propagation
Through Production System


Each reaction becomes a production rule
Each of the 21 metabolites in the nutrient set becomes an
axiom
Nutrient
set
Products
Metabolite
pool
PGDB
reaction
set
“Fire”
reactions
A+BC
Reactants
83
SRI International Bioinformatics
Nutrients: A, B, C, E, F
A+BW
C+DX
E+FY
W+YZ
Produced Compounds: W, Y, Z
84
SRI International Bioinformatics
Initial Metabolite Nutrient Set
(Total: 21 compounds)
Nutrients (8)
(M61 Minimal growth
medium)
Nutrients (10)
(Environment)
Bootstrap Compounds
(3)
85
H+, Fe2+, Mg2+, K+, NH3,
SO42-, PO42-, Glucose
Water, Oxygen, Trace
elements (Mn2+, Co2+,
Mo2+, Ca2+, Zn2+, Cd2+,
Ni2+, Cu2+)
ATP, NADP, CoA
SRI International Bioinformatics
Essential Compounds
E. coli Total: 41 compounds
 Proteins
(20)
 Amino acids
 Nucleic acids (DNA & RNA) (8)
 Nucleosides
 Cell membrane (3)
 Phospholipids
 Cell wall (10)
 Peptidoglycan precursors
 Outer cell wall precursors (Lipid-A, oligosaccharides)
86
SRI International Bioinformatics
87
SRI International Bioinformatics
 http://brg.ai.sri.com/ptools09/slides/Tuesday/growt
h-experiment-Markus-Krummenacker.txt
88
SRI International Bioinformatics
Flux Balance Modeling





89
Generate, store, and update metabolic model within Pathway
Tools
 Fast, accurate generation of metabolic model
 Close coupling to genome and regulatory information
 Extensive schema
 Extensive query and visualization tools
Debug/validate model using Pathway Tools
Export to SBML and import to constraint solver for model
execution
Visualize reaction flux and omics data using overviews
Copy/update multiple PGDBs to reflect alternative strains
SRI International Bioinformatics
Download