Computing with Pathway/Genome Databases 1 SRI International Bioinformatics Motivations for Understanding Pathway Tools Schema When writing complex queries to PGDBs, those queries must refer to classes and slots within the schema Queries using Lisp, Perl, Java APIs Queries using Query Page Queries using Structured Advanced Query Form 2 SRI International Bioinformatics Motivations for Understanding Schema Pathway Tools visualizations and analyses depend upon the software being able to find precise information in precise places within a Pathway/Genome DB A Pathway/Genome Database is a web of interconnected objects; each object represents a biological entity 3 SRI International Bioinformatics Pathway Tools Implementation Details 4 Platforms: Macintosh, PC/Linux, and PC/Windows platforms Same binary can run as desktop app or Web server Production-quality software Version control Two regular releases per year Extensive quality assurance Extensive documentation Auto-patch Automatic DB-upgrade 420,000 lines of Lisp code SRI International Bioinformatics More Information Pathway http://bioinformatics.ai.sri.com/ptools/ http://bioinformatics.ai.sri.com/ptools/examples.lisp PerlCyc & JavaCyc API , includes some relationships http://www.arabidopsis.org/tools/aracyc/perlcyc/ http://www.arabidopsis.org/tools/aracyc/javacyc/ Pathway Tools Web Site, Tutorial Slides Tools User’s Guide Appendix: Guide to the Pathway Tools Schema Curator's Guide http://bioinformatics.ai.sri.com/ptools/curatorsguide.pdf aic/pathway-tools/nav/12.0/lisp/relationships.lisp 8 SRI International Bioinformatics References Ontology Papers section of http://biocyc.org/publications.shtml "An Evidence Ontology for use in Pathway/Genome Databases" 9 "An ontology for biological function based on molecular interactions" "Representations of metabolic knowledge: Pathways" "Representations of metabolic knowledge" SRI International Bioinformatics Data Exchange 10 APIs: Lisp API, Java API, and Perl API : read & modify Cyclone Export to files BioPAX Export: since Pathway Tools 9.0 Biopax.org Export PGDB genome to Genbank format Export entire PGDB as column-delimited and attribute-value file formats Export PGDB reactions as SBML -- sbml.org Import/Export of Pathways: between PGDBs Import/Export of Selected Frames, for Spreadsheets Import/Export of Compounds as Molfile, CML Registering/Publishing PGDBs on WWW BioWarehouse : Loader for Flatfiles, SQL access http://bioinformatics.ai.sri.com/biowarehouse/ BMC Bioinformatics 7:170 2006 SRI International Bioinformatics Programmatic Access to BioCyc Common LISP • Native language of Pathway Tools • Interactive & Mature Environment • Full Access to the Data & Many Utility Functions • Source code is available for academics PerlCyc • API of Functions, Exposed to Perl • Communication through UNIX Socket JavaCyc • API of Functions, Exposed to Java • Communication through UNIX Socket • 11 Cyclone SRI International Bioinformatics Cyclone Developed by Schachter and colleagues from Genoscope http://nemo-cyclone.sourceforge.net/archi.php Cyclone is a Java-based system that: Extracts data from a Pathway Tools PGDB Converts it to an XML schema Maps the data to Java objects and to a relational database Changes made to the data on the Java side can be committed back to a Pathway Tools PGDB 12 SRI International Bioinformatics Pathway Tools Data Model 13 PGDBs are object-oriented databases Frame Representation System, named Ocelot Frame data model PGDB = Knowledge base = KB = Database = DB Frames Slots PGDBs are stored in three possible ways • Preloaded into binary executable • Ocelot file: single-user • RDBMS: MySQL-4 or Oracle-10 : multi-user, change-logging Query API: GFP (Generic Frame Protocol) SRI International Bioinformatics Frames Entities with which facts are associated Kinds of frames: Classes: Genes, Pathways, Biosynthetic Pathways Instances (objects): trpA, TCA cycle Classes: A 14 Superclass(es), Subclass(es), Instance(s) symbolic frame name (id, key) uniquely identifies each frame Examples: EG10223, TRP, Proteins SRI International Bioinformatics Slots Encode attributes and properties of a frame Represent 15 relationships between frames The value of a slot is the identifier of another frame SRI International Bioinformatics Slots Number of values Single valued Multivalued: sets, bags Slot values Any LISP object: Integer, real, string, symbol (frame name) Every slot is described by a “slot frame” in a KB that defines meta information about that slot Datatype, classes it pertains to, constraints Two slots are inverses if they encode opposite relationships 16 Slot Product in class Genes Slot Gene in class Polypeptides SRI International Bioinformatics Pathway Tools Ontology / Schema Ontology classes: 1621 Datatype classes: Define objects from genomes to pathways Classification systems for pathways, chemical compounds, enzymatic reactions (EC system) Protein Feature ontology Controlled vocabularies: Cell Component Ontology Evidence codes Comprehensive set of 248 attributes and relationships 17 SRI International Bioinformatics Root Classes in the Pathway Tools Ontology Chemicals Polymer-Segments Protein-Features Paralogous-Gene-Groups Organisms Generalized-Reactions Enzymatic-Reactions Regulation -- Reactions and pathways -- Link enzymes to reactions they catalyze -- Regulatory interactions CCO Evidence -- Cell Component Ontology -- Evidence ontology Notes Organizations People Publications -- Timestamped, person-stamped notes 18 -- All molecules -- Regions of polymers -- Features on proteins SRI International Bioinformatics Use GKB Editor to Inspect the Pathway Tools Ontology GKB Editor = Generic Knowledge Base Editor Type in Navigator window: (GKB) or [Right-Click] Edit->Ontology Editor View->Browse Class Hierarchy [Middle-Click] to expand hierarchy To view classes or instances, select them and: Frame -> List Frame Contents Frame -> Edit Frame 19 SRI International Bioinformatics Schema Overview 20 SRI International Bioinformatics Principal Classes Class names are capitalized, plural, separated by dashes Genetic-Elements, with subclasses: Chromosomes Plasmids Genes Transcription-Units RNAs rRNAs, snRNAs, tRNAs, Charged-tRNAs Proteins, with subclasses: Polypeptides Protein-Complexes 21 SRI International Bioinformatics Principal Classes Reactions, with subclasses: Transport-Reactions Enzymatic-Reactions Pathways Compounds-And-Elements 22 SRI International Bioinformatics Principal Classes Regulation Regulation-of-Enzyme-Activity Regulation-of-Transcription 23 Regulation-of-Transcription-Initiation Transcriptional-Attenuation SRI International Bioinformatics Example of a Single GFP Call The General Pattern: gfp-function(frame-ID slot-ID value ...) (gfp-function frame-ID slot-ID value …) LISP (get-slot-values 'TRYPSYN-RXN 'LEFT) ==> (INDOLE-3-GLYCEROL-P SER) 25 SRI International Bioinformatics Architecture of the API server – PerlCyc and JavaCyc Works on Unix (Solaris or Linux) only Start up Pathway Tools with the –api arg Pathway Tools listens on a Unix socket – perl program communicates through this socket Supports both querying and editing PGDBs Must run perl or java program on the same machine that runs Pathway Tools This is a security measure, as the API server has no built-in security Can only handle one connection at a time 26 SRI International Bioinformatics Obtaining PerlCyc and JavaCyc Download from http://www.sgn.cornell.edu/downloads/ PerlCyc written and maintained by Lukas Mueller at Boyce Thompson Institute for Plant Research. JavaCyc written by Thomas Yan at Carnegie Institute, maintained by Lukas Mueller. Easy to extend… 27 SRI International Bioinformatics Examples of PerlCyc, JavaCyc Functions GFP functions (require knowledge of Pathway Tools schema): getSlotValues get_slot_values getClassAllInstances get_class_all_instances putSlotValues put_slot_values Pathway Tools functions (described at http://bioinformatics.ai.sri.com/ptools/ptools-fns.html): genes_of_reaction genesOfReaction find_indexed_frame findIndexedFrame pathways_of_gene pathwaysOfGene transport_p transportP 28 SRI International Bioinformatics Writing a PerlCyc or JavaCyc program Create a PerlCyc, JavaCyc object: perlcyc -> new (“ORGID”) new Javacyc (“ORGID”) Call PerlCyc, JavaCyc functions on this object: my $cyc = perlcyc -> new (“ECOLI”); my @pathways = $cyc -> all_pathways (); Javacyc cyc = new Javacyc(“ECOLI”); ArrayList pathways = cyc.allPathways (); Functions return object IDs, not objects. Must connect to server again to retrieve attributes of an object. foreach my $p (@pathways) { print $cyc -> get_slot_value ($p, “COMMON-NAME”);} for (int i=0; I < pathways.size(); i++) { String pwy = (String) pathways.get(i); System.out.println (cyc.getSlotValue (pwy, “COMMON-NAME”); } 29 SRI International Bioinformatics Sample PerlCyc Query Number of proteins in E. coli use perlcyc; my $cyc = perlcyc -> new (“ECOLI”); my @proteins = $cyc-> get_class_all_instances("|Proteins|"); my $protein_count = scalar(@proteins); print "Protein count: $protein_count.\n"; 30 SRI International Bioinformatics Sample PerlCyc Query Print IDs of all proteins with molecular weight between 10 and 20 kD and pI between 4 and 5. use perlcyc; my $cyc = perlcyc -> new (“ECOLI”); foreach my $p ($cyc->get_class_all_instances("|Proteins|")) { my $mw = $cyc->get_slot_value($p, "molecular-weight-kd"); my $pI = $cyc->get_slot_value($p, "pi"); if ($mw <= 20 && $mw >= 10 && $pI <= 5 && $pI >= 4) { print "$p\n"; } } 31 SRI International Bioinformatics Sample PerlCyc Query List all the transcription factors in E. coli, and the list of genes that each regulates: use perlcyc; my $cyc = perlcyc -> new (“ECOLI”); foreach my $p ($cyc->get_class_all_instances("|Proteins|")) { if ($cyc->transcription_factor_p($p)) { my $name = $cyc->get_slot_value($p, "common-name"); my %genes = (); foreach my $tu ($cyc->regulon_of_protein($p)) { foreach my $g ($cyc->transcription_unit_genes($tu)) { $genes{$g} = $cyc->get_slot_value($g, "common-name"); } } print "\n\n$name: "; print join " ", values %genes; } } 32 SRI International Bioinformatics Sample Editing Using PerlCyc Add a link from each gene to the corresponding object in MY-DB (assume ID is same in both cases) use perlcyc; my $cyc = perlcyc -> new (“HPY”); my @genes = $cyc->get_class_all_instances (“|Genes|”); foreach my $g (@genes) { $cyc->add_slot_value ($g, “DBLINKS”, “(MY-DB \”$g\”)”); } $cyc->save_kb(); 33 SRI International Bioinformatics Sample JavaCyc Query: Enzymes for which ATP is a regulator import java.util.*; public class JavacycSample { public static void main(String[] args) { Javacyc cyc = new Javacyc("ECOLI"); ArrayList regframes = cyc.getClassAllInstances("|Regulation-of-Enzyme-Activity|"); for (int i = 0; i < regframes.size(); i++) { String reg = (String)regframes.get(i); boolean bool = cyc.memberSlotValueP(reg, “Regulator", "ATP"); if (bool) { String enzrxn = cyc.getSlotValue (reg, “Regulated-Entity”); String enzyme = cyc.getSlotValue (enzrxn, “Enzyme”); System.out.println(enz); } } } } 34 SRI International Bioinformatics Simple Lisp Query Example: Enzymes for which ATP is a regulator (defun atp-inhibits () (loop for x in (get-class-all-instances '|Regulation-of-Enzyme-Activity|) ;; Does the Regulator slot contain the compound ATP, and the mode ;; of regulation is negative (inhibition)? when (and (member-slot-value-p x ‘Regulator 'ATP) (member-slot-value-p x ‘Mode “-”) ) ;; Whenever the test is positive, we collect the value of the slot Enzyme ;; of the Regulated-Entity of the regulatory interaction frame. ;; The collected values are returned as a list, once the loop terminates. collect (get-slot-value (get-slot-value x ‘Regulated-Entity) ‘Enzyme) ) ) ;;; invoking the query: (select-organism :org-id 'ECOLI) (atp-inhibits) (get-slot-values 'TRYPSYN-RXN 'LEFT) ==> (INDOLE-3-GLYCEROL-P SER) 35 SRI International Bioinformatics Simple Perl Query Example: Enzymes for which ATP is a regulator use perlcyc; my $cyc = perlcyc -> new("ECOLI"); my @regs = $cyc -> get_class_all_instances("|Regulation-of-EnzymeActivity|"); ## We check every instance of the class foreach my $reg (@regs) { ## We test for whether the INHIBITORS-ALL ## slot contains the compound frame ATP my $bool1 = $cyc -> member_slot_value_p($reg, “Regulator", "Atp"); my $bool2 = $cyc -> member_slot_value_p($reg, “Mode", “-"); if ($bool1 && $bool2) { ## Whenever the test is positive, we collect the value of the slot ENZYME . ## The results are printed in the terminal. my $enzrxn = $cyc -> get_slot_value($reg, “Regulated-Entity"); my $enz = $cyc -> get_slot_value($enzrxn, "Enzyme"); print STDOUT "$enz\n"; } } 36 SRI International Bioinformatics Getting started with Lisp pathway-tools –lisp (load “file”) (compile-file “file.lisp”) Emacs is a useful editor Pathway Tools source code is available: ask Lisp resources: http://bioinformatics.ai.sri.com/ptools/ptools-resources.html 37 SRI International Bioinformatics Query Gotchas Study schema carefully :test #’fequal Cascade of slot-values: check for NIL 38 SRI International Bioinformatics Semantic Inference Layer relationships.lisp Library of functions that encapsulate common query building blocks and intricacies of navigating the schema enzymes-of-gene reactions-of-gene pathways-of-gene genes-of-pathway pathway-hole-p reactions-of-compound top-containers(protein) all-rxns(type) (:metab-smm :metab-all :metab-pathways :enzyme :transport etc.) 39 (all-rxns :metab-pathways) SRI International Bioinformatics Pathway Tools Schema and Semantic Inference Layer Genes, Operons, and Replicons 40 SRI International Bioinformatics Representing a Genome components genome ORG 41 Gene1 CHROM1 Gene2 CHROM2 Gene3 PLASMID1 product Classes: ORG is of class Organisms CHROM1 is of class Chromosomes PLASMID1 is of class Plasmids Gene1 is of class Genes Product1 is of class Polypeptides or RNA SRI International Bioinformatics Product1 42 (defun genes-of-chrom (chrom) (loop for x in (get-slot-values chrom ‘components) when (instance-all-instance-of-p x ‘|Genes|) collect x) ) SRI International Bioinformatics Polynucleotides Review slots of COLI and of COLI-K12 43 SRI International Bioinformatics Genetic-Elements Sequence is stored in a separate file or database table 44 SRI International Bioinformatics Polymer-Segments Review slots of Genes 45 SRI International Bioinformatics Complexities of Gene / Gene-Product Relationships The Product of a gene can be an instance of Polypeptides or RNAs An instance of Polypeptides can have more than one gene encoding it Sequence position: Nucleotide positions of starting and ending codons specified in Left-EndPosition and Right-End-Position (usually greater, except at origin) Transcription-Direction + / Alternative splicing: Nucleotide positions of starting and ending codons specified in Left-EndPosition and Right-End-Position Intron positions specified in Splice-Form-Introns of gene product 46 (200 300) (350 400) SRI International Bioinformatics Gene Reaction Schematic 47 SRI International Bioinformatics Substring Search Example Find all genes that contain a given substring within their common name or synonym list. (defun find-gene-by-substring (substring) (let (result) (loop for g in (get-class-all-instances '|Genes|) do (loop for name in (get-slot-values g 'names) when (search substring name :test #'string-equal) do (pushnew g result) )) result )) 48 SRI International Bioinformatics Proteins 49 SRI International Bioinformatics Proteins and Protein Complexes Polypeptide: the monomer protein product of a gene (may have multiple isoforms, as indicated at gene level) Protein complex: proteins consisting of multiple polypeptides or protein complexes Example: DNA pol III DnaE is a polypeptide pol III core is DnaE and two other polypeptides pol III holoenzymes is several protein complexes combined 50 SRI International Bioinformatics Protein Complex Relationships 51 SRI International Bioinformatics Slots of a protein (DnaE) catalyzes Is it a regulator/reactant/etc? comment component-of dblinks features (edited in feature editor) Many 52 other attributes possible SRI International Bioinformatics A complex at the frame level (pol III) Most of the same attributes as polypeptide frame component-of and components note coefficients 53 SRI International Bioinformatics Protein Complex Relationships 54 SRI International Bioinformatics Relationships are Defined in Many Places component-of comes from creating a complex appears-in-left-side-of comes from defining a reaction (as do modified forms) regulates comes from an enzymatic reaction or TU can only edit dna-footprint if protein has been associated with a TU 55 SRI International Bioinformatics Semantic Inference Layer Reactions-of-protein (prot) Returns a list of rxns this protein catalyzes Transcription-units-of-proteins(prot) Returns a list of TU’s activated/inhibited by the given protein Transporter? (prot) Is this protein a transporter? Polypeptide-or-homomultimer?(prot) Transcription-factor? (prot) Obtain-protein-stats Returns 5 values 56 Length of : all-polypeptides, complexes, transporters, enzymes, etc… SRI International Bioinformatics Example Find all enzymes that use pyridoxal phosphate as a cofactor or prosthetic group (loop for protein in (get-class-all-instances ‘|Proteins|) for enzrxn = (get-slot-value protein ‘enzymatic-reaction) when (and enzrxn (or (member-slot-value-p enzrxn ‘cofactors ‘pyridoxal_phosphate) (member-slot-value-p enzrxn ‘prosthetic-groups ‘pyridoxal_phosphate)) collect protein) (member-slot-value-p frame slot value) : T if Value is one of the values of Slot of Frame. 57 SRI International Bioinformatics Sample Find all proteins without a comment anywhere 58 SRI International Bioinformatics RNAs 59 SRI International Bioinformatics RNAs PGDBs only represent RNAs that are “terminal gene products” tRNAs rRNAs Regulatory RNAs Miscellaneous small RNAs Slots similar to proteins tRNAs 60 can have an anticodon SRI International Bioinformatics 61 SRI International Bioinformatics The RNA Ontology 62 SRI International Bioinformatics Compounds / Reactions / Pathways 63 SRI International Bioinformatics Compounds / Reactions / Pathways Think of a three tiered structure: Reactions built on top of compounds Pathways built on top of reactions Metabolic network defined by reactions alone; pathways are an additional “optional” structure Some reactions not part of a pathway Some reactions have no attached enzyme Some enzymes have no attached gene 64 SRI International Bioinformatics Compounds Relatively few aspects of a compound defined within the compound editor MW, formula calculated from edited structure Most aspects defined in other editors “Pathway reactions” comes from reaction editing followed by pathway editing Activator, etc come from the protein editor 65 SRI International Bioinformatics 66 SRI International Bioinformatics (print-frame ‘TRP) -- Instance TRP --Types: |Amino-Acid|, |Aromatic-Amino-Acids|, |Non-polar-amino-acids| APPEARS-IN-LEFT-SIDE-OF: RXN0-287, TRANS-RXN-76, TRYPTOPHAN-RXN, TRYPTOPHAN--TRNA-LIGASE-RXN APPEARS-IN-RIGHT-SIDE-OF: RXN0-2382, RXN0-301, TRANS-RXN-76, TRYPSYN-RXN CHEMICAL-FORMULA: (C 11), (H 12), (N 2), (O 2) COMMON-NAME: "L-tryptophan" DBLINKS: (LIGAND-CPD "C00078" NIL |kaipa| 3311532640 NIL NIL), (CAS "6912-86-3"), (CAS "73-22-3") NAMES: "L-tryptophan", "W", "tryptacin", "trofan", "trp", "tryptophan", "2-amino-3-indolylpropanic acid" SMILES: "c1(c(CC(N)C(=O)O)c2(c([nH]1)cccc2))" SYNONYMS: "W", "tryptacin", "trofan", "trp", "tryptophan", "2-amino-3-indolylpropanic acid" ____________________________________________ 67 SRI International Bioinformatics Semantic Inference Layer Reactions-of-compound (cpd) Pathways-of-compound (cpd) Is-substrate-an-autocatalytic-enzyme-p (cpd) Activated/inhibited-by? (cpds slots) Returns a list of enzrxns for which a cpd in cpds is a modulator (example slots: activators-all, activators-allosteric) All-substrates (rxns) All unique substrates specified in the given rxns Has-structure-p (cpd) Obtain-cpd-stats Returns two values: 69 Length of :all-cpds, cpds with structures SRI International Bioinformatics Queries with Multiple Answers Navigator queries: Example: Substring search for “pyruvate” Selected list is placed on the Answer list Use “Next Answer” button to view each one of them Lisp queries: Example : Find reactions involving pyruvate as a substrate (get-class-all-instances ‘|Compounds|) (loop for rxn in (get-class-all-instances ‘|Reactions|) when (member ‘pyruvate (get-slot-values rxn ‘substrates) collect rxn) (replace-answer-list * ) 72 SRI International Bioinformatics Reactions 73 SRI International Bioinformatics Reactions Represents information about a reaction that is independent of enzymes that catalyze the reaction Connected to enzyme(s) via enzymatic reaction frames Classified with EC system when possible 2.7.7.7 – DNA-directed DNA polymerization Carried out by five enzymes in E. coli Example: 74 SRI International Bioinformatics Reaction Ontology 75 SRI International Bioinformatics Where is 2.7.7.7 in the Ontology? 76 SRI International Bioinformatics Slots of Reaction Frames Balance-state EC-number Enzymatic-reaction Generated in protein or reaction editor In-pathway Generated in pathway editor Left and Right (reactants / products) Can include modified forms of proteins, RNAs, etc here Not all reactants/products need to be frames 77 SRI International Bioinformatics 78 SRI International Bioinformatics Enzymatic Reactions (DnaE and 2.7.7.7) A necessary bridge between enzymes and “generic” versions of reactions Carries information specific to an enzyme/reaction combination: Cofactors and prosthetic groups Alternative substrates Links to regulatory interactions Frame is generated when protein is associated with reaction (via protein or reaction editor) 80 SRI International Bioinformatics 81 SRI International Bioinformatics Regulation of Enzyme Activity 82 SRI International Bioinformatics Semantic Inference Layer Genes-of-reaction (rxn) Substrates-of-reaction (rxn) Enzymes-of-reaction (rxn) Lacking-ec-number (organism) Returns list of rxns with no ec numbers in that database Get-reaction-direction-in-pathway (pwy rxn) Reaction-type(rxn) Indicates types of Rxn as: Small molecule rxn, transport rxn, protein-small-molecule rxn (one substrate is protein and one is a small molecule), protein rxn (all substrates are proteins), etc. All-rxns(type) Specify the type of reaction (see above for type) Obtain-rxn-stats Returns six values 83 Length of : all-rxns, transport, non-transport, etc… SRI International Bioinformatics Find all small-molecule reactions that have no enzyme but are not spontaneous (“orphan” reactions) (defun orphan-reactions (&optional (verbose? t)) (loop for r in (all-rxns :small-molecule) when (and (not (slot-has-value-p r 'enzymatic-reaction)) (not (get-slot-value r 'spontaneous?))) collect r) ) 84 SRI International Bioinformatics Reaction Direction Left/Right reflect direction of reaction as written by Enzyme Commission Reflects systematic direction for different reaction classes Left/Right do not necessarily correspond to physiological direction of a reaction Get-rxn-direction(rxn) Returns :L2R or :R2L or :BOTH or NIL Integrates all available info about direction of this reaction 85 Direction(s) it occurs in all pathways in the PGDB Direction(s) as specified in Enzymatic-Reactions SRI International Bioinformatics Pathways 86 SRI International Bioinformatics What is a Pathway? An ordered set of interconnected, directed biochemical reactions Reactions form a coherent unit, e.g. Regulated as a single unit Evolutionarily conserved across organisms as a single unit When combined, perform a single cellular function Historically grouped together as a unit Includes metabolic pathways and signalling pathways Evidence for all reactions in a single organism Pathways can be linear, cyclical, branched, or some combination 88 SRI International Bioinformatics Internal Representation of Pathways REACTION-LIST: unordered list of reactions that comprise the pathway PREDECESSORS: list of reaction pairs that define ordering relationships between reactions. E.g. R1 R2 C A B R3 D (R2 R1) : Predecessor of R2 is R1 (R3 R1) : Predecessor of R3 is R1 (R1) : R1 has no predecessor (can be omitted) 89 SRI International Bioinformatics What is missing from Pathway Representation? Reaction directions Some reactions are unidirectional, but many are reversible – how do we know in which direction to draw the reaction? Main vs. side substrates A B C D E F Main compounds form the backbone of the pathway 90 substrates shared between connecting reactions major inputs and outputs. Side compounds omitted from pathway diagrams at low detail levels Individual reactions do not necessarily have main and side compounds – a particular substrate may be either a main or a side depending on the pathway context. SRI International Bioinformatics Computing Directionality and Mains/Sides Our philosophy: Enable curator to specify as little as possible. Compute as much as possible. This reduces redundancy and potential for inconsistencies. Example: Reactions R1: A + B C + D R2: B E Predecessors: (R2 R1) Only substrate overlap is B B must be a main substrate A must be a side substrate, R1 must proceed from right to left R2 must proceed from left to right C+DBE A 91 SRI International Bioinformatics But… Unfortunately, mains, sides and reaction directions are sometimes ambiguous: At beginnings and ends of pathways Use heuristics to determine main/side substrates at beginnings, ends of pathways Not always what the curator wants Substrate overlap with both sides of a reaction, e.g. A + B C + D C+BE Solution: Additional slot PRIMARIES, should only be populated when necessary: PRIMARIES: (R (A B) (C)) says that for reaction R, A and B are both main reactants, and C is a main product. 92 SRI International Bioinformatics More Complications… 93 ENZYME-USE: a reaction may be catalyzed by multiple enzymes, but not all the enzymes necessarily participate in a given pathway Not present in the same compartment with rest of pathway enzymes Down-regulated or not expressed under conditions in which pathway is active ENZYME-USE slot tells us which enzymes catalyze reaction in pathway, if not all. LAYOUT-ADVICE: helps software draw pathway correctly, e.g. in a cyclical pathway, tells which substrate should be at the top. HYPOTHETICAL-REACTIONS: list of reactions in the pathway that are considered hypothetical (i.e. no direct experimental evidence) SRI International Bioinformatics Polymerization Pathways … X[n] X[n+1] X[10] POLYMERIZATION-LINKS: specifies reactions that should be connected by a polymerization link (X R1 R1) --- REACTANT-NAME-SLOT: N-NAME --- PRODUCT-NAME-SLOT: N+1-NAME CLASS-INSTANCE-LINKS: specifies when a link should be drawn between a substrate class and some instance of it (necessary only if instance is not a member of some reaction, so no predecessor relationship can be defined) R1 --- PRODUCT-INSTANCES: X[10] 94 SRI International Bioinformatics Super-Pathways Collection of pathways that connect to each other via common substrates or reactions, or as part of some larger logical unit Can contain both sub-pathways and additional connecting reactions Can be nested arbitrarily REACTION-LIST: a pathway ID instead of a reaction ID in this slot means include all reactions from the specified pathway PREDECESSORS: a pathway ID instead of a tuple in this slot means include all predecessor tuples from the specified pathway 95 SRI International Bioinformatics Querying Pathways Programmatically 97 See http://bioinformatics.ai.sri.com/ptools/ptools-resources.html (all-pathways) (base-pathways) Returns list of all pathways that are not super-pathways (genes-of-pathway pwy) (unique-genes-of-pathway pwy) Returns list of all genes of a pathway that are not also part of other pathways (enzymes-of-pathway pwy) (substrates-of-pathway pwy) (variants-of-pathway pwy) Returns all pathways in the same variant class as a pathway (get-predecessors rxn pwy), (get-successors rxn pwy) (get-rxn-direction-in-pathway pwy rxn) (pathway-inputs pwy), (pathway-outputs pwy) Returns all compounds consumed (produced) but not produced (consumed) by pathway (ignores stoichiometry) SRI International Bioinformatics Example Queries Find all genes involved in metabolic pathways: (remove-duplicates (loop for p in (all-pathways) append (genes-of-pathway p))) Find all compounds that are unique to a single pathway: (loop for p in (base-pathways) append (loop for c in (substrates-of-pathway p) when (null (remove p (pathways-of-compound c))) collect (list c p))) 98 SRI International Bioinformatics Regulation 99 SRI International Bioinformatics Regulation Reorganization and expansion of regulation under way in Pathway Tools Initial application to EcoCyc Class Regulation with subclasses that describe different biochemical mechanisms of regulation Slots: Regulator Regulated-Entity Mode Mechanism 100 SRI International Bioinformatics Regulation of Enzyme Activity Class Regulation-of-Enzyme-Activity Each instance of the class describes one regulatory interaction Slots: Regulator -- usually a small molecule Regulated-Entity -- an Enzymatic-Reaction Mechanism -- One of: Competitive, Uncompetitive, Noncompetitive, Irreversible, Allosteric, Other Mode -- One of: + , Physiologically-relevant? – true/false 101 SRI International Bioinformatics Transcription Initiation Class Regulation-of-Transcription-Initiation Transcription factor binds to DNA binding site to regulate transcription initiation from a promoter Slots: Regulator -- instance of Proteins or Complexes (a transcription-factor) Regulated-Entity -- instance of Promoters Mode -- One of: + , Associated-binding-site – a DNA-Binding-Site 102 SRI International Bioinformatics Attenuation Class Transcriptional-Attenuation Several subclasses depending on type of attenuation Slots common to all: Regulator -- Depends on subtype of attenuation Regulated-Entity -- instance of Terminators Mode -- One of: + , - 103 SRI International Bioinformatics Attenuation Subtypes 104 Ribosome-Mediated-Attenuation E.g. trp operon – ribosome pauses based on levels of charged tRNA, determines formation of terminator or antiterminator RNA-Mediated-Attenuation RNA (tRNA or sRNA) binds to transcript, determines formation of terminator or antiterminator Protein-Mediated-Attenuation Protein binds to transcript, determines formation of terminator or antiterminator Small-Molecule-Mediated-Attenuation Small molecule binds to transcript, determines formation of terminator or antiterminator Rho-Blocking-Antitermination RNA-Polymerase-Modification Regulatory protein binds to site in transcription unit and interacts with RNA polymerase to determine termination SRI International Bioinformatics Transcriptional Regulation trp apoTrpR site001 rxn001 reg001 TrpR*trp pro001 trpLEDCBA trpL trpE trpD trpC trpB trpA term001 105 SRI International Bioinformatics reg002 charged-tRNA*trp Data Exchange 106 SRI International Bioinformatics Data Exchange Java API and Perl API : read & modify BioPAX Export: since Pathway Tools 9.0 Biopax.org Export of entire PGDB as Flatfiles Export of Reactions as SBML -- sbml.org Import/Export of Pathways: between PGDBs Import/Export of Selected Frames, for Spreadsheets Import/Export of Compounds as Molfile, CML Registering/Publishing PGDBs on WWW Export PGDB as Genbank 107 BioWarehouse : Loader for Flatfiles, SQL access http://bioinformatics.ai.sri.com/biowarehouse/ SRI International Bioinformatics Dump PGDB into Flatfiles 108 Export of entire PGDB as Flatfiles Format Description: UG v.I section 4.5 Column delimited: 1 line per frame Attribute-value: 1 record per frame Multiple slot values: Column delimited: several values per column Attribute-value: several lines for several values SRI International Bioinformatics Frame Import/Export 109 Import/Export of Selected Frames, for Spreadsheets Frame selection, Slot selection GUI Format Description: UG v.I section 4.6.3 Column delimited: 1 line per frame Attribute-value: 1 record per frame Multiple slot values: Column delimited: several values per column Attribute-value: several lines for several values SRI International Bioinformatics