intro-to-ptools-and-biocyc - Bioinformatics Research Group at

advertisement

Introduction to the

Pathway Tools Software and

BioCyc Database Collection

MetaCyc Family of

Pathway/Genome Databases

SRI International

Bioinformatics

 2,500+ databases from multiple institutions

 Cover all domains of life with microbial emphasis

 All DBs derived from MetaCyc via computational pathway prediction

 Common schema

 Common controlled vocabularies

 Common methodologies

Curated Databases Within the

MetaCyc Family

SRI International

Bioinformatics

Database

MetaCyc

EcoCyc

HumanCyc

AraCyc

YeastCyc

MouseCyc

Organism

Multiorganism

E. coli

H. sapiens

A. thaliana

S. cerevisiae

M. musculus

Organization

SRI

SRI

SRI

Curated From

34,000

23,000

Carnegie Instit.

2,282

Stanford Univ 565

Jackson Labs

BioCyc Collection of 1,700

Pathway/Genome Databases

 Pathway/Genome Database (PGDB) – combines information about

Pathways, reactions, substrates

Enzymes, transporters

Genes, replicons

Transcription factors/sites, promoters, operons

 Tier 1: Literature-Derived PGDBs

MetaCyc, HumanCyc, YeastCyc

EcoCyc -- Escherichia coli K-12

AraCyc – Arabidopsis thaliana

 Tier 2: Computationally-derived DBs,

Some Curation -- 34 PGDBs

Bacillus subtilis, Mycobacterium tuberculosis

 Tier 3: Computationally-derived DBs, No

Curation -- The remainder

SRI International

Bioinformatics

SRI International

Pathway/Genome Database

Bioinformatics

Pathways

Reactions

Proteins

RNAs

Genes

Chromosomes

Plasmids

Compounds

Sequence Features

Operons

Promoters

DNA Binding Sites

Regulatory Interactions

CELL

Pathway Tools Software:

PGDBs Created Outside SRI

SRI International

Bioinformatics

 3,000+ licensees: 250+ groups applying software to 1,700 organisms

 Saccharomyces cerevisiae , SGD project, Stanford University

135 pathways / 565 publications – BioCyc.org

 FungiCyc, Broad Institute

Candida albicans, CGD project, Stanford University

 dictyBase, Northwestern University

 Mouse , MGD, Jackson Laboratory -- BioCyc.org

 Drosophila , FlyBase, Harvard University -- BioCyc.org

 Under development:

C. elegans, WormBase

 Arabidopsis thaliana, TAIR, Carnegie Institution of Washington

288 pathways / 2282 publications – BioCyc.org

 ChlamyCyc, GoFORSYS

PlantCyc, Carnegie Institution of Washington

 Six Solanaceae species, Cornell University

 GrameneDB, Cold Spring Harbor Laboratory

Medicago truncatula, Samuel Roberts Noble Foundation

Pathway Tools Software:

PGDBs Created Outside SRI

SRI International

Bioinformatics

 G. Serres, MBL, Shewanella oneidensis

 M. Bibb, John Innes Centre, Streptomyces coelicolor

 TBDB Project, Mycobacterium tuberculosis

 F. Brinkman, Simon Fraser Univ, Pseudomonas aeruginosa

 Genoscope, Acinetobacter

 R.J.S. Baerends, University of Groningen, Lactococcus

lactis IL1403, Lactococcus lactis MG1363, Streptococcus

pneumoniae TIGR4, Bacillus subtilis 168, Bacillus cereus

ATCC14579

 Matthew Berriman, Sanger Centre, Trypanosoma brucei,

Leishmania major

 Sergio Encarnacion, UNAM, Sinorhizobium meliloti

 Mark van der Giezen, University of London, Entamoeba histolytica, Giardia intestinalis

Pathway Tools Software:

PGDBs Created Outside SRI

SRI International

Bioinformatics

 Large scale users:

C. Medigue, Genoscope, 500+ PGDBs

J. Zucker, Broad Inst, 94 PGDBs

G. Sutton, J. Craig Venter Institute, 80+ PGDBs

G. Burger, U Montreal, 60+ PGDBs

E. Uberbacher, ORNL 33 Bioenergy-related organisms

Bart Weimer, UC Davis , Lactococcus lactis, Brevibacterium linens,

Lactobacillus acidophilus, Lactobacillus plantarum, Lactobacillus johnsonii,

Listeria monocytogenes

 Partial listing of outside PGDBs at http://biocyc.org/otherpgdbs.shtml

SRI International

Bioinformatics

Pathway Tools Software

 Comprehensive software environment spanning computational genomics and systems biology

 Create and maintain an organism database integrating genome, pathway, regulatory information

Computational inference tools

Interactive editing tools

 Query and visualize that database

 Interpret genome-scale datasets

 Comparative analysis tools

 Generate flux-balance models

Pathway Tools Software

Annotated

Genome

+ PathoLogic

SRI International

Bioinformatics

Genome-Scale

Flux Model

Pathway/Genome

Database

Pathway/Genome

Navigator

Pathway/Genome

Editors

Briefings in Bioinformatics 11:40-79 2010

SRI International

Bioinformatics

Pathway Tools Software: PathoLogic

 Computational creation of new Pathway/Genome

Databases

 Transforms genome into Pathway Tools schema and layers inferred information above the genome

 Predicts operons

 Predicts metabolic network

 Predicts which genes code for missing enzymes in metabolic pathways

 Infers transport reactions from transporter names

Bioinformatics 18:S225 2002

Pathway Tools Software:

Pathway/Genome Editors

 Interactively update PGDBs with graphical editors

 Support geographically distributed teams of curators with object database system

 Gene editor

 Protein editor

 Reaction editor

 Compound editor

 Pathway editor

 Operon editor

 Publication editor

SRI International

Bioinformatics

SRI International

Bioinformatics

What is Curation?

Ongoing updating and refinement of a PGDB

Correcting false-positive and false-negative predictions

Incorporating information from experimental literature

Authoring of comments and citations

Updating database fields

Gene positions, names, synonyms

Protein functions, activators, inhibitors

Addition of new pathways, modification of existing pathways

Defining TF binding sites, promoters, regulation of transcription initiation and other processes

Pathway Tools Software:

Pathway/Genome Navigator

 Querying and visualization of:

Pathways

Reactions

Metabolites

Proteins

Genes

Chromosomes

 Two modes of operation:

Web mode

Desktop mode

Most functionality shared, but each has unique functionality

SRI International

Bioinformatics

SRI International

Bioinformatics

Pathway Tools Ontology / Schema

 Ontology classes: 1621

Datatype classes: Define objects from genomes to pathways

Classification systems for pathways, chemical compounds, enzymatic reactions (EC system)

Protein Feature ontology

Controlled vocabularies:

Cell Component Ontology

Evidence codes

 Comprehensive set of 248 attributes and relationships

SRI International

Bioinformatics

What is a Pathway?

 A connected sequence of biochemical reactions

 Occurs in one organism

 Conserved through evolution

 Regulated as a unit

 Starts or stops at one of 13 common intermediate metabolites

SRI International

Comparison of BioCyc to KEGG

Bioinformatics

 KEGG approach: Static collection of reference pathway diagrams are color-coded to produce organism-specific views

 KEGG vs MetaCyc: Resource on literature-derived pathways

KEGG maps are not pathways Nuc Acids Res 34:3687 2006

KEGG maps contain multiple biological pathways

KEGG maps are composites of pathways in many organisms -- do not identify what specific pathways elucidated in what organisms

KEGG has no literature citations, no comments, less enzyme detail

 KEGG vs BioCyc organism-specific PGDBs

KEGG does not curate or customize pathway networks for each organism

Highly curated PGDBs now exist for important organisms such as E. coli, yeast, mouse, Arabidopsis

KEGG re-annotates entire genome for each organism

Comparison of

Pathway Tools to KEGG

SRI International

Bioinformatics

 Inference tools

KEGG does not predict presence or absence of pathways

KEGG lacks pathway hole filler, operon predictor

 Curation tools

KEGG does not distribute curation tools

No ability to customize pathways to the organism

Pathway Tools schema much more comprehensive

 Visualization and analysis

KEGG does not perform automatic pathway layout

No comparative pathway analysis

SRI International

Bioinformatics

Pathway Tools Implementation Details

 Allegro Common Lisp

 PC/Windows, Linux, Macintosh platforms

 Ocelot object database

 600,000+ lines of code

 Lisp-based WWW server at BioCyc.org

Manages 1,100+ PGDBs

SRI International

Bioinformatics

EcoCyc iPhone App

 Available in iTunes store

 Free

 Look up gene information while on travel, at a conference, in the library

Automated Generation of

Metabolic Flux Models from

PGDBs

Joint work with Mario Latendresse

Flux-Balance Analysis

SRI International

Bioinformatics

 Steady state, constraint-based quantitative models of metabolism

 Starting information for organism of interest:

Nutrients

A

Metabolic Reaction List

A B C D

X

Biomass

Secretions

D

SRI International

Bioinformatics

Flux Balance Models

 Submit to linear optimization package

Optimize biomass production, ATP production, etc

 Results

Steady-state reaction fluxes for the metabolic network

 Remove reactions from the model to predict knock-out phenotypes

 Supply alternative nutrient sets to predict growth phenotypes

Approach: Derive FBA Models from PGDBs

SRI International

Bioinformatics

 Store and update metabolic model within Pathway Tools

The PGDB is the model

All query and visualization tools applicable to FBA model

FBA model is tightly coupled to genome and regulatory information

 Export to constraint solver for model execution/solving

 Reaction balance checking

 Dead-end metabolite analysis

 Visualize reaction flux using cellular overview

 Multiple gap filling

SRI International

Bioinformatics

Multiple Gap Filling of FBA Models

 Reaction gap filling

(Kumar et al, BMC Bioinf 2007 8:212)

:

Reverse directionality of selected reactions

Add a minimal number of reactions from MetaCyc to the model to enable a solution

Reaction cost is a function of reaction taxonomic range

 Metabolite gap filling: Postulate additional nutrients and secretions

 Partial solutions: Identify maximal subset of biomass components for which model can yield positive production rates

Downloading Pathway Tools

SRI International

Bioinformatics

 Obtain license

 http://biocyc.org/download.shtml

 Download directory offers several configurations

 Choose platform and database configuration

Many combinations of databases available

All databases requires a lot of memory

Use registry to add PGDBs to configuration you downloaded

SRI International

Bioinformatics

Information Sources

Pathway Tools User’s Guide

 aic-export/pathway-tools/ptools/14.0/doc/manuals/userguide.pdf

NOTE: Location of the aic-export directory can vary across different computers

 Pathway Tools Web Site

 http://bioinformatics.ai.sri.com/ptools/

Publications, FAQ, programming examples, etc.

 Slides from this tutorial

 http://bioinformatics.ai.sri.com/ptools/tutorial/sessions/

 BioCyc Webinars

 http://biocyc.org/webinar.shtml

 Desktop vs Web functionality in Pathway Tools

 http://biocyc.org/desktop-vs-web-mode.shtml

SRI International

Bioinformatics

Information Sources

 Publications

“Pathway Tools version 13.0: Integrated Software for

Pathway/Genome Informatics and Systems Biology”,

Briefings in Bioinformatics 11:40-79 2010

“A survey of metabolic databases emphasizing the MetaCyc family”, Archives of Toxicology 2011

Information Sources

 BioCyc Web site: Help Menu

Basic Help

Search Help

BioCyc Glossary

Publications

Website User Guide

PGDB Concepts

Guide to EcoCyc

Guide to MetaCyc

SRI International

Bioinformatics

Download