SRI International
Bioinformatics
2,500+ databases from multiple institutions
Cover all domains of life with microbial emphasis
All DBs derived from MetaCyc via computational pathway prediction
Common schema
Common controlled vocabularies
Common methodologies
SRI International
Bioinformatics
Database
MetaCyc
EcoCyc
HumanCyc
AraCyc
YeastCyc
MouseCyc
Organism
Multiorganism
E. coli
H. sapiens
A. thaliana
S. cerevisiae
M. musculus
Organization
SRI
SRI
SRI
Curated From
34,000
23,000
Carnegie Instit.
2,282
Stanford Univ 565
Jackson Labs
Pathway/Genome Database (PGDB) – combines information about
Pathways, reactions, substrates
Enzymes, transporters
Genes, replicons
Transcription factors/sites, promoters, operons
Tier 1: Literature-Derived PGDBs
MetaCyc, HumanCyc, YeastCyc
EcoCyc -- Escherichia coli K-12
AraCyc – Arabidopsis thaliana
Tier 2: Computationally-derived DBs,
Some Curation -- 34 PGDBs
Bacillus subtilis, Mycobacterium tuberculosis
Tier 3: Computationally-derived DBs, No
Curation -- The remainder
SRI International
Bioinformatics
SRI International
Bioinformatics
Pathways
Reactions
Proteins
RNAs
Genes
Chromosomes
Plasmids
Compounds
Sequence Features
Operons
Promoters
DNA Binding Sites
Regulatory Interactions
CELL
SRI International
Bioinformatics
3,000+ licensees: 250+ groups applying software to 1,700 organisms
Saccharomyces cerevisiae , SGD project, Stanford University
135 pathways / 565 publications – BioCyc.org
FungiCyc, Broad Institute
Candida albicans, CGD project, Stanford University
dictyBase, Northwestern University
Mouse , MGD, Jackson Laboratory -- BioCyc.org
Drosophila , FlyBase, Harvard University -- BioCyc.org
Under development:
C. elegans, WormBase
Arabidopsis thaliana, TAIR, Carnegie Institution of Washington
288 pathways / 2282 publications – BioCyc.org
ChlamyCyc, GoFORSYS
PlantCyc, Carnegie Institution of Washington
Six Solanaceae species, Cornell University
GrameneDB, Cold Spring Harbor Laboratory
Medicago truncatula, Samuel Roberts Noble Foundation
SRI International
Bioinformatics
G. Serres, MBL, Shewanella oneidensis
M. Bibb, John Innes Centre, Streptomyces coelicolor
TBDB Project, Mycobacterium tuberculosis
F. Brinkman, Simon Fraser Univ, Pseudomonas aeruginosa
Genoscope, Acinetobacter
R.J.S. Baerends, University of Groningen, Lactococcus
lactis IL1403, Lactococcus lactis MG1363, Streptococcus
pneumoniae TIGR4, Bacillus subtilis 168, Bacillus cereus
ATCC14579
Matthew Berriman, Sanger Centre, Trypanosoma brucei,
Leishmania major
Sergio Encarnacion, UNAM, Sinorhizobium meliloti
Mark van der Giezen, University of London, Entamoeba histolytica, Giardia intestinalis
SRI International
Bioinformatics
Large scale users:
C. Medigue, Genoscope, 500+ PGDBs
J. Zucker, Broad Inst, 94 PGDBs
G. Sutton, J. Craig Venter Institute, 80+ PGDBs
G. Burger, U Montreal, 60+ PGDBs
E. Uberbacher, ORNL 33 Bioenergy-related organisms
Bart Weimer, UC Davis , Lactococcus lactis, Brevibacterium linens,
Lactobacillus acidophilus, Lactobacillus plantarum, Lactobacillus johnsonii,
Listeria monocytogenes
Partial listing of outside PGDBs at http://biocyc.org/otherpgdbs.shtml
SRI International
Bioinformatics
Comprehensive software environment spanning computational genomics and systems biology
Create and maintain an organism database integrating genome, pathway, regulatory information
Computational inference tools
Interactive editing tools
Query and visualize that database
Interpret genome-scale datasets
Comparative analysis tools
Generate flux-balance models
Annotated
Genome
+ PathoLogic
SRI International
Bioinformatics
Genome-Scale
Flux Model
Pathway/Genome
Database
Pathway/Genome
Navigator
Pathway/Genome
Editors
Briefings in Bioinformatics 11:40-79 2010
SRI International
Bioinformatics
Computational creation of new Pathway/Genome
Databases
Transforms genome into Pathway Tools schema and layers inferred information above the genome
Predicts operons
Predicts metabolic network
Predicts which genes code for missing enzymes in metabolic pathways
Infers transport reactions from transporter names
Bioinformatics 18:S225 2002
Interactively update PGDBs with graphical editors
Support geographically distributed teams of curators with object database system
Gene editor
Protein editor
Reaction editor
Compound editor
Pathway editor
Operon editor
Publication editor
SRI International
Bioinformatics
SRI International
Bioinformatics
Ongoing updating and refinement of a PGDB
Correcting false-positive and false-negative predictions
Incorporating information from experimental literature
Authoring of comments and citations
Updating database fields
Gene positions, names, synonyms
Protein functions, activators, inhibitors
Addition of new pathways, modification of existing pathways
Defining TF binding sites, promoters, regulation of transcription initiation and other processes
Querying and visualization of:
Pathways
Reactions
Metabolites
Proteins
Genes
Chromosomes
Two modes of operation:
Web mode
Desktop mode
Most functionality shared, but each has unique functionality
SRI International
Bioinformatics
SRI International
Bioinformatics
Ontology classes: 1621
Datatype classes: Define objects from genomes to pathways
Classification systems for pathways, chemical compounds, enzymatic reactions (EC system)
Protein Feature ontology
Controlled vocabularies:
Cell Component Ontology
Evidence codes
Comprehensive set of 248 attributes and relationships
SRI International
Bioinformatics
A connected sequence of biochemical reactions
Occurs in one organism
Conserved through evolution
Regulated as a unit
Starts or stops at one of 13 common intermediate metabolites
SRI International
Bioinformatics
KEGG approach: Static collection of reference pathway diagrams are color-coded to produce organism-specific views
KEGG vs MetaCyc: Resource on literature-derived pathways
KEGG maps are not pathways Nuc Acids Res 34:3687 2006
KEGG maps contain multiple biological pathways
KEGG maps are composites of pathways in many organisms -- do not identify what specific pathways elucidated in what organisms
KEGG has no literature citations, no comments, less enzyme detail
KEGG vs BioCyc organism-specific PGDBs
KEGG does not curate or customize pathway networks for each organism
Highly curated PGDBs now exist for important organisms such as E. coli, yeast, mouse, Arabidopsis
KEGG re-annotates entire genome for each organism
SRI International
Bioinformatics
Inference tools
KEGG does not predict presence or absence of pathways
KEGG lacks pathway hole filler, operon predictor
Curation tools
KEGG does not distribute curation tools
No ability to customize pathways to the organism
Pathway Tools schema much more comprehensive
Visualization and analysis
KEGG does not perform automatic pathway layout
No comparative pathway analysis
SRI International
Bioinformatics
Allegro Common Lisp
PC/Windows, Linux, Macintosh platforms
Ocelot object database
600,000+ lines of code
Lisp-based WWW server at BioCyc.org
Manages 1,100+ PGDBs
SRI International
Bioinformatics
Available in iTunes store
Free
Look up gene information while on travel, at a conference, in the library
Joint work with Mario Latendresse
SRI International
Bioinformatics
Steady state, constraint-based quantitative models of metabolism
Starting information for organism of interest:
Nutrients
A
Metabolic Reaction List
A B C D
X
Biomass
Secretions
D
SRI International
Bioinformatics
Submit to linear optimization package
Optimize biomass production, ATP production, etc
Results
Steady-state reaction fluxes for the metabolic network
Remove reactions from the model to predict knock-out phenotypes
Supply alternative nutrient sets to predict growth phenotypes
SRI International
Bioinformatics
Store and update metabolic model within Pathway Tools
The PGDB is the model
All query and visualization tools applicable to FBA model
FBA model is tightly coupled to genome and regulatory information
Export to constraint solver for model execution/solving
Reaction balance checking
Dead-end metabolite analysis
Visualize reaction flux using cellular overview
Multiple gap filling
SRI International
Bioinformatics
Reaction gap filling
(Kumar et al, BMC Bioinf 2007 8:212)
:
Reverse directionality of selected reactions
Add a minimal number of reactions from MetaCyc to the model to enable a solution
Reaction cost is a function of reaction taxonomic range
Metabolite gap filling: Postulate additional nutrients and secretions
Partial solutions: Identify maximal subset of biomass components for which model can yield positive production rates
SRI International
Bioinformatics
Obtain license
http://biocyc.org/download.shtml
Download directory offers several configurations
Choose platform and database configuration
Many combinations of databases available
All databases requires a lot of memory
Use registry to add PGDBs to configuration you downloaded
SRI International
Bioinformatics
Pathway Tools User’s Guide
aic-export/pathway-tools/ptools/14.0/doc/manuals/userguide.pdf
NOTE: Location of the aic-export directory can vary across different computers
Pathway Tools Web Site
http://bioinformatics.ai.sri.com/ptools/
Publications, FAQ, programming examples, etc.
Slides from this tutorial
http://bioinformatics.ai.sri.com/ptools/tutorial/sessions/
BioCyc Webinars
http://biocyc.org/webinar.shtml
Desktop vs Web functionality in Pathway Tools
http://biocyc.org/desktop-vs-web-mode.shtml
SRI International
Bioinformatics
Publications
“Pathway Tools version 13.0: Integrated Software for
Pathway/Genome Informatics and Systems Biology”,
Briefings in Bioinformatics 11:40-79 2010
“A survey of metabolic databases emphasizing the MetaCyc family”, Archives of Toxicology 2011
BioCyc Web site: Help Menu
Basic Help
Search Help
BioCyc Glossary
Publications
Website User Guide
PGDB Concepts
Guide to EcoCyc
Guide to MetaCyc
SRI International
Bioinformatics