schema-api - SRI International

advertisement
Computing with Pathway/Genome
Databases
1
SRI International Bioinformatics
Overview
 Summary
of Pathway Tools data access
mechanisms and formats
 Pathway
Tools APIs
 Overview
2
of Pathway Tools schema
SRI International Bioinformatics
Writing Complex PGDB Queries
 When
writing complex queries to PGDBs, those
queries must refer to classes and slots within the
schema
 Queries using Lisp, Perl, Java APIs
 Queries using Structured Advanced Query Form
 Queries using BioVelo
3
SRI International Bioinformatics
More Information
 Pathway


http://bioinformatics.ai.sri.com/ptools/
http://bioinformatics.ai.sri.com/ptools/examples.lisp
 PerlCyc


& JavaCyc API , includes some relationships
http://www.arabidopsis.org/tools/aracyc/perlcyc/
http://www.arabidopsis.org/tools/aracyc/javacyc/
 Pathway

Tools Web Site, Tutorial Slides
Tools User’s Guide
Appendix: Guide to the Pathway Tools Schema
 Curator's

Guide
http://bioinformatics.ai.sri.com/ptools/curatorsguide.pdf
 aic/pathway-tools/nav/12.0/lisp/relationships.lisp
5
SRI International Bioinformatics
References
 Ontology
Papers section of
http://biocyc.org/publications.shtml
 "An Evidence Ontology for use in Pathway/Genome
Databases"
6

"An ontology for biological function based on molecular
interactions"

"Representations of metabolic knowledge: Pathways"

"Representations of metabolic knowledge"
SRI International Bioinformatics
Data Exchange


7
APIs: Lisp API, Java API, and Perl API
 Read and modify access
Cyclone

Export to files
 BioPAX Export
Biopax.org
 Export PGDB genome to Genbank format
 Export entire PGDB as column-delimited and attribute-value file formats
 Export PGDB reactions as SBML -- sbml.org
 Import/Export of Pathways: between PGDBs
 Import/Export of Selected Frames, for Spreadsheets
 Import/Export of Compounds as Molfile, CML

BioWarehouse : Loader for Flatfiles, SQL access
 http://bioinformatics.ai.sri.com/biowarehouse/
 BMC Bioinformatics 7:170 2006
SRI International Bioinformatics
Programmatic Access to BioCyc

Common LISP
•
Native language of Pathway Tools
•
Interactive & Mature Environment
•
Full Access to the Data & Many Utility Functions
•
Source code is available for academics

PerlCyc
•
API of Functions, Exposed to Perl
•
Communication through UNIX Socket

JavaCyc
•
API of Functions, Exposed to Java
•
Communication through UNIX Socket
•
8
Cyclone
SRI International Bioinformatics
Cyclone
 Developed
by Schachter and colleagues from
Genoscope
 http://nemo-cyclone.sourceforge.net/archi.php
 Cyclone
is a Java-based system that:
 Extracts data from a Pathway Tools PGDB
 Converts it to an XML schema
 Maps the data to Java objects and to a relational database
 Changes made to the data on the Java side can be
committed back to a Pathway Tools PGDB
9
SRI International Bioinformatics
Lisp API
 Accessible
whenever you start Pathway Tools
with the –lisp argument
 Lisp queries evaluate against the running
Pathway Tools binary and execute very fast
10
SRI International Bioinformatics
Generic Frame Protocol (GFP)
A
library of procedures for accessing Ocelot DBs
 GFP
specification:
 http://www.ai.sri.com/~gfp/spec/paper/paper.html
A
small number of GFP functions are sufficient for
most complex queries
11
SRI International Bioinformatics
Example of a Single GFP Call
The General Pattern:
gfp-function(frame-ID slot-ID value ...)
(gfp-function frame-ID slot-ID value …)

LISP
(get-slot-values 'TRYPSYN-RXN 'LEFT)
==> (INDOLE-3-GLYCEROL-P SER)

12
SRI International Bioinformatics
Generic Frame Protocol
13

get-class-all-instances (Class)
 Returns the instances of Class

coercible-to-frame-p (Thing)
 Is Thing a frame? Returns True if Thing is the name of a frame, or a frame object;
else False
SRI International Bioinformatics
Generic Frame Protocol

Notation Frame.Slot means a specified slot of a specified
frame

get-slot-value(Frame Slot)
 Returns first value of Frame.Slot
get-slot-values(Frame Slot)
 Returns all values of Frame.Slot as a list





14
slot-has-value-p(Frame Slot)
 Returns True if Frame.Slot has at least one value; else False
member-slot-value-p(Frame Slot Value)
 Returns True if Value is one of the values of Frame.Slot; else False
print-frame(Frame)
 Prints the contents of Frame
Note: Frame and Slot must be symbols!
SRI International Bioinformatics
Generic Frame Protocol –
Update Operations

put-slot-value(Frame Slot Value)
 Replace the current value(s) of Frame.Slot with Value

put-slot-values(Frame Slot Value-List)
 Replace the current value(s) of Frame.Slot with Value-List, which must be a list of
values

add-slot-value(Frame Slot Value)
 Add Value to the current value(s) of Frame.Slot, if any

remove-slot-value(Frame Slot Value)
 Remove Value from the current value(s) of Frame.slot

replace-slot-value(Frame Slot Old-Value New-Value)
 In Frame.Slot, replace Old-Value with New-Value
remove-local-slot-values(Frame Slot)
SRI 
International
Remove all ofBioinformatics
the values of Frame.Slot

15
Generic Frame Protocol –
Update Operations
 save-kb

16
Saves the current KB
SRI International Bioinformatics
Additional Pathway Tools Functions –
Semantic Inference Layer
 Semantic
inference layer defines built-in
functions to compute commonly required
relationships in a PGDB
 http://bioinformatics.ai.sri.com/ptools/ptoolsfns.html
17
SRI International Bioinformatics
PerlCyc and JavaCyc
 Work
on Unix (Solaris or Linux) only
 Start up Pathway Tools with the –api arg
 Pathway Tools listens on a Unix socket – perl
program communicates through this socket
 Supports both querying and editing PGDBs
 Must run perl or java program on the same
machine that runs Pathway Tools
 This is a security measure, as the API server has no built-in
security
 Can only handle one connection at a time
18
SRI International Bioinformatics
Obtaining PerlCyc and JavaCyc
Download from
http://www.sgn.cornell.edu/downloads/
PerlCyc written and maintained by Lukas Mueller at
Boyce Thompson Institute for Plant Research.
JavaCyc written by Thomas Yan at Carnegie
Institute, maintained by Lukas Mueller.
Easy to extend…
19
SRI International Bioinformatics
Examples of PerlCyc, JavaCyc
Functions
GFP
functions (require knowledge of Pathway Tools
schema):
 getSlotValues
 get_slot_values
 getClassAllInstances
 get_class_all_instances
 putSlotValues
 put_slot_values
Pathway Tools functions (described at
http://bioinformatics.ai.sri.com/ptools/ptools-fns.html):
 genes_of_reaction
 genesOfReaction
 find_indexed_frame
 findIndexedFrame
 pathways_of_gene
 pathwaysOfGene
 transport_p
 transportP
20
SRI International Bioinformatics
Writing a PerlCyc or JavaCyc program



Create a PerlCyc, JavaCyc object:
perlcyc -> new (“ORGID”)
new Javacyc (“ORGID”)
Call PerlCyc, JavaCyc functions on this object:
my $cyc = perlcyc -> new (“ECOLI”);
my @pathways = $cyc -> all_pathways ();
Javacyc cyc = new Javacyc(“ECOLI”);
ArrayList pathways = cyc.allPathways ();
Functions return object IDs, not objects.
 Must connect to server again to retrieve attributes of an object.
foreach my $p (@pathways) {
print $cyc -> get_slot_value ($p, “COMMON-NAME”);}
for (int i=0; I < pathways.size(); i++) {
String pwy = (String) pathways.get(i);
System.out.println (cyc.getSlotValue (pwy, “COMMON-NAME”); }
21
SRI International Bioinformatics
Sample PerlCyc Query
 Number
of proteins in E. coli
use perlcyc;
my $cyc = perlcyc -> new (“ECOLI”);
my @proteins = $cyc->
get_class_all_instances("|Proteins|");
my $protein_count = scalar(@proteins);
print "Protein count: $protein_count.\n";
22
SRI International Bioinformatics
Sample PerlCyc Query
 Print
IDs of all proteins with molecular weight
between 10 and 20 kD and pI between 4 and 5.
use perlcyc;
my $cyc = perlcyc -> new (“ECOLI”);
foreach my $p ($cyc->get_class_all_instances("|Proteins|")) {
my $mw = $cyc->get_slot_value($p, "molecular-weight-kd");
my $pI = $cyc->get_slot_value($p, "pi");
if ($mw <= 20 && $mw >= 10 && $pI <= 5 && $pI >= 4) {
print "$p\n";
}
}
23
SRI International Bioinformatics
Sample PerlCyc Query
 List
all the transcription factors in E. coli, and the
list of genes that each regulates:
use perlcyc;
my $cyc = perlcyc -> new (“ECOLI”);
foreach my $p ($cyc->get_class_all_instances("|Proteins|")) {
if ($cyc->transcription_factor_p($p)) {
my $name = $cyc->get_slot_value($p, "common-name");
my %genes = ();
foreach my $tu ($cyc->regulon_of_protein($p)) {
foreach my $g ($cyc->transcription_unit_genes($tu)) {
$genes{$g} = $cyc->get_slot_value($g, "common-name");
}
}
print "\n\n$name: ";
print join " ", values %genes;
}
}
24
SRI International Bioinformatics
Sample Editing Using PerlCyc
 Add
a link from each gene to the corresponding
object in MY-DB (assume ID is same in both
cases)
use perlcyc;
my $cyc = perlcyc -> new (“HPY”);
my @genes = $cyc->get_class_all_instances (“|Genes|”);
foreach my $g (@genes) {
$cyc->add_slot_value ($g, “DBLINKS”, “(MY-DB \”$g\”)”);
}
$cyc->save_kb();
25
SRI International Bioinformatics
Sample JavaCyc Query:
Enzymes for which ATP is a regulator
import java.util.*;
public class JavacycSample {
public static void main(String[] args) {
Javacyc cyc = new Javacyc("ECOLI");
ArrayList regframes =
cyc.getClassAllInstances("|Regulation-of-Enzyme-Activity|");
for (int i = 0; i < regframes.size(); i++) {
String reg = (String)regframes.get(i);
boolean bool = cyc.memberSlotValueP(reg, “Regulator", "ATP");
if (bool) {
String enzrxn = cyc.getSlotValue (reg, “Regulated-Entity”);
String enzyme = cyc.getSlotValue (enzrxn, “Enzyme”);
System.out.println(enz); } } } }
26
SRI International Bioinformatics
Simple Lisp Query Example:
Enzymes for which ATP is a regulator
(defun atp-inhibits ()
(loop for x in (get-class-all-instances '|Regulation-of-Enzyme-Activity|)
;; Does the Regulator slot contain the compound ATP, and the mode
;; of regulation is negative (inhibition)?
when (and (member-slot-value-p x ‘Regulator 'ATP)
(member-slot-value-p x ‘Mode “-”) )
;; Whenever the test is positive, we collect the value of the slot Enzyme
;; of the Regulated-Entity of the regulatory interaction frame.
;; The collected values are returned as a list, once the loop terminates.
collect (get-slot-value (get-slot-value x ‘Regulated-Entity) ‘Enzyme) )
)
;;; invoking the query:
(select-organism :org-id 'ECOLI)
(atp-inhibits)
(get-slot-values 'TRYPSYN-RXN 'LEFT)
==> (INDOLE-3-GLYCEROL-P SER)
27
SRI International Bioinformatics
Simple Perl Query Example:
Enzymes for which ATP is a regulator
use perlcyc;
my $cyc = perlcyc -> new("ECOLI");
my @regs = $cyc -> get_class_all_instances("|Regulation-of-EnzymeActivity|");
## We check every instance of the class
foreach my $reg (@regs) {
## We test for whether the INHIBITORS-ALL
## slot contains the compound frame ATP
my $bool1 = $cyc -> member_slot_value_p($reg, “Regulator", "Atp");
my $bool2 = $cyc -> member_slot_value_p($reg, “Mode", “-");
if ($bool1 && $bool2) {
## Whenever the test is positive, we collect the value of the slot
ENZYME .
## The results are printed in the terminal.
my $enzrxn = $cyc -> get_slot_value($reg, “Regulated-Entity");
my $enz = $cyc -> get_slot_value($enzrxn, "Enzyme");
print STDOUT "$enz\n";
}
}
28
SRI International Bioinformatics
Getting started with Lisp





pathway-tools –lisp
(load “file”) (compile-file “file.lisp”)
Emacs is a useful editor
Pathway Tools source code is available: ask
Lisp resources:
http://bioinformatics.ai.sri.com/ptools/ptools-resources.html
29
SRI International Bioinformatics
Viewing Results via the Answer List
 (replace-answer-list
30
SRI International Bioinformatics
(query))
Query Gotchas
 Study
schema carefully
 :test #’fequal
 Cascade of slot-values: check for NIL
31
SRI International Bioinformatics
Semantic Inference Layer
relationships.lisp

Library of functions that encapsulate common query
building blocks and intricacies of navigating the schema

enzymes-of-gene
reactions-of-gene
pathways-of-gene
genes-of-pathway
pathway-hole-p
reactions-of-compound
top-containers(protein)
all-rxns(type) (:metab-smm :metab-all :metab-pathways :enzyme :transport







etc.)

32
(all-rxns :metab-pathways)
SRI International Bioinformatics
Pathway Tools Schema and
Semantic Inference Layer
33
SRI International Bioinformatics
Pathway Tools Ontology / Schema
 Ontology
classes: 1621
 Datatype classes: Define objects from genomes to pathways
 Classification systems / controlled vocabularies




Pathways, chemical compounds, enzymatic reactions (EC system)
Protein Feature ontology
Cell Component Ontology
Evidence Ontology
 Comprehensive
set of 279 attributes and
relationships
34
SRI International Bioinformatics
Polynucleotides
35
SRI International Bioinformatics
Use GKB Editor to Inspect the
Pathway Tools Ontology
 GKB
Editor = Generic Knowledge Base Editor
 Type in Navigator window: (GKB)
or
 [Right-Click] Edit->Ontology Editor
 View->Browse
Class Hierarchy
 [Middle-Click] to expand hierarchy
 To view classes or instances, select them and:
 Frame -> List Frame Contents
 Frame -> Edit Frame
36
SRI International Bioinformatics
Use the SAQP to Inspect the Schema
37
SRI International Bioinformatics
Pathway Tools Schema
 Appendix
 Schema
38
of Pathway Tools User’s Guide
overview diagram
SRI International Bioinformatics
Root Classes in the Pathway Tools
Ontology

Chemicals
Polymer-Segments
Protein-Features
Paralogous-Gene-Groups

Organisms

Generalized-Reactions
Enzymatic-Reactions
Regulation
-- Reactions and pathways
-- Link enzymes to reactions they catalyze
-- Regulatory interactions
CCO
Evidence
-- Cell Component Ontology
-- Evidence ontology
Notes
Organizations
People
Publications
-- Timestamped, person-stamped notes











39
-- All molecules
-- Regions of polymers
-- Features on proteins
SRI International Bioinformatics
Principal Classes

Class names are usually capitalized, plural, separated by
dashes

Genetic-Elements, with subclasses:
 Chromosomes
 Plasmids
Genes
Transcription-Units
RNAs
 rRNAs, snRNAs, tRNAs, Charged-tRNAs
Proteins, with subclasses:
 Polypeptides
 Protein-Complexes




40
SRI International Bioinformatics
Principal Classes
 Reactions
 Enzymatic-Reactions
 Pathways
 Compounds-And-Elements
 Regulation
41
SRI International Bioinformatics
Semantic Network Diagrams
TCA Cycle
in-pathway
Succinate + FAD = fumarate + FADH2
reaction
Enzymatic-reaction
catalyzes
Succinate dehydrogenase
component-of
Sdh-flavo
Sdh-Fe-S
Sdh-membrane-1
Sdh-membrane-2
product
sdhA
42
sdhB
SRI International Bioinformatics
sdhC
sdhD
Pathway Tools Schema and
Semantic Inference Layer
Genes, Operons, and Replicons
43
SRI International Bioinformatics
Representing a Genome
components
genome
ORG
44
Gene1
CHROM1
Gene2
CHROM2
Gene3
PLASMID1

product
Classes:
 ORG is of class Organisms
 CHROM1 is of class Chromosomes
 PLASMID1 is of class Plasmids
 Gene1 is of class Genes
 Product1 is of class Polypeptides or RNA
SRI International Bioinformatics
Product1

45
(defun genes-of-chrom (chrom)
(loop for x in (get-slot-values chrom ‘components)
when (instance-all-instance-of-p x ‘|Genes|)
collect x)
)
SRI International Bioinformatics
Polynucleotides
Review slots of COLI and of COLI-K12
46
SRI International Bioinformatics
Genetic-Elements
 Sequence
is stored in
 File PGDB: A separate file
 Relational DBMS PGDB: A relational database table
47
SRI International Bioinformatics
Polymer-Segments
Review slots of Genes
48
SRI International Bioinformatics
Complexities of Gene / Gene-Product
Relationships




The Product of a gene can be an instance of Polypeptides
or RNAs
An instance of Polypeptides can have more than one gene
encoding it
Sequence position:
 Nucleotide positions of starting and ending codons specified in Left-EndPosition and Right-End-Position (usually greater, except at origin)
 Transcription-Direction + / Alternative splicing:
 Nucleotide positions of starting and ending codons specified in Left-EndPosition and Right-End-Position
 Intron positions specified in Splice-Form-Introns of gene product

49
(200 300) (350 400)
SRI International Bioinformatics
Gene Reaction Schematic
50
SRI International Bioinformatics
Proteins
51
SRI International Bioinformatics
Proteins and Protein Complexes
 Polypeptide:
the monomer protein product of a
gene (may have multiple isoforms, as indicated at
gene level)
 Protein
complex: proteins consisting of multiple
polypeptides or protein complexes
 Example:
DNA pol III
 DnaE is a polypeptide
 pol III core enzyme contains DnaE, DnaQ, HolE
 pol III holoenzyme contains pol III core enzyme plus three
other complexes
52
SRI International Bioinformatics
Protein Complex Relationships
53
SRI International Bioinformatics
Slots of a protein (DnaE)
 catalyzes
 Is
it an activator/reactant/etc?
 comments
 component-of
 dblinks
 features (edited in feature editor)
 Many
54
other features possible
SRI International Bioinformatics
A complex at the frame level (pol III)
 Same
features as polypeptide frame, different use
 comment
 component-of
and components
 note coefficients
55
SRI International Bioinformatics
Protein Complex Relationships
56
SRI International Bioinformatics
Relationships are Defined in Many
Places
 component-of
comes from creating a complex
 appears-in-left-side-of
comes from defining a
reaction (as do modified forms)
 inhibitor-of
comes from an enzymatic reaction
 can
only edit dna-footprint if protein has been
associated with a TU
57
SRI International Bioinformatics
Semantic Inference Layer
 Reactions-of-protein
(prot)
 Returns a list of rxns this protein catalyzes
 Transcription-units-of-proteins(prot)
 Returns a list of TU’s activated/inhibited by the given protein
 Transporter? (prot)
 Is this protein a transporter?
 Polypeptide-or-homomultimer?(prot)
 Transcription-factor? (prot)
 Obtain-protein-stats
 Returns 5 values

58
Length of : all-polypeptides, complexes, transporters, enzymes, etc…
SRI International Bioinformatics
Example
 Find
all enzymes that use pyridoxal phosphate as
a cofactor or prosthetic group

(loop for protein in (get-class-all-instances ‘|Proteins|)
for enzrxn = (get-slot-value protein ‘enzymatic-reaction)
when (and enzrxn
(or (member-slot-value-p enzrxn ‘cofactors ‘pyridoxal_phosphate)
(member-slot-value-p enzrxn ‘prosthetic-groups
‘pyridoxal_phosphate))
collect protein)
(member-slot-value-p frame slot value) : T if Value is one of the values of
Slot of Frame.
59
SRI International Bioinformatics
Sample
 Find
all proteins without
a comment anywhere
60
SRI International Bioinformatics
Compounds / Reactions / Pathways
61
SRI International Bioinformatics
Compounds / Reactions / Pathways
 Think
of a three tiered structure:
 Reactions built on top of compounds
 Pathways built on top of reactions
 Metabolic network defined by reactions alone;
pathways are an additional “optional” structure
 Some reactions not part of a pathway
 Some reactions have no attached enzyme
 Some enzymes have no attached gene
62
SRI International Bioinformatics
Compounds
63
SRI International Bioinformatics
64
SRI International Bioinformatics
Compounds
 Relatively
few aspects of a compound defined
within the compound editor
 MW, formula calculated from edited structure
 Most
aspects defined in other editors
 “Pathway reactions” comes from reaction editing followed by
pathway editing
 Activator, etc come from the enzymatic reaction editor
65
SRI International Bioinformatics
-- Instance TRP --Types: |Amino-Acid|, |Aromatic-Amino-Acids|, |Non-polar-amino-acids|
APPEARS-IN-LEFT-SIDE-OF: RXN0-287, TRANS-RXN-76, TRYPTOPHAN-RXN,
TRYPTOPHAN--TRNA-LIGASE-RXN
APPEARS-IN-RIGHT-SIDE-OF: RXN0-2382, RXN0-301, TRANS-RXN-76, TRYPSYN-RXN
CHEMICAL-FORMULA: (C 11), (H 12), (N 2), (O 2)
COMMON-NAME: "L-tryptophan"
DBLINKS: (LIGAND-CPD "C00078" NIL |kaipa| 3311532640 NIL NIL),
(CAS "6912-86-3"), (CAS "73-22-3")
NAMES: "L-tryptophan", "W", "tryptacin", "trofan", "trp", "tryptophan",
"2-amino-3-indolylpropanic acid"
SMILES: "c1(c(CC(N)C(=O)O)c2(c([nH]1)cccc2))"
SYNONYMS: "W", "tryptacin", "trofan", "trp", "tryptophan",
"2-amino-3-indolylpropanic acid"
____________________________________________
66
SRI International Bioinformatics
Where is diphosphate in the
ontology?
67
SRI International Bioinformatics
Semantic Inference Layer
 Reactions-of-compound
(cpd)
 Pathways-of-compound (cpd)
 Is-substrate-an-autocatalytic-enzyme-p (cpd)
 Activated/inhibited-by? (cpds slots)
 Returns a list of enzrxns for which a cpd in cpds is a
modulator (example slots: activators-all, activators-allosteric)
 All-substrates (rxns)
 All unique substrates specified in the given rxns
 Has-structure-p (cpd)
 Obtain-cpd-stats
 Returns two values:

68
Length of :all-cpds, cpds with structures
SRI International Bioinformatics
Miscellaneous things….
History
List
 Back/Forward and History buttons
 Default list is 50 items
 Show
frame
 (print-frame ‘frame)
69
SRI International Bioinformatics
70
SRI International Bioinformatics
Queries with Multiple Answers

Navigator queries:
 Example: Substring search for “pyruvate”
 Selected list is placed on the Answer list
 Use “Next Answer” button to view each one of them

Lisp queries:
Example : Find reactions involving pyruvate as a substrate

(get-class-all-instances ‘|Compounds|)
(loop
for rxn in (get-class-all-instances ‘|Reactions|)
when (member ‘pyruvate (get-slot-values rxn ‘substrates)
collect rxn)
(replace-answer-list * )
71
SRI International Bioinformatics
Reactions
72
SRI International Bioinformatics
Enzymatic Reactions (DnaE and
2.7.7.7)
A
necessary bridge between enzymes and
“generic” versions of reactions
 Carries information specific to an
enzyme/reaction combination:
 Cofactors and prosthetic groups
 Alternative substrates
 Links to regulatory interactions
 Frame
is generated when protein is associated
with reaction (via protein or reaction editor)
73
SRI International Bioinformatics
74
SRI International Bioinformatics
Regulation of Enzyme Activity
75
SRI International Bioinformatics
Reactions
 Represents
information about a reaction that is
independent of enzymes that catalyze the reaction
 Connected
to enzyme(s) via enzymatic reaction
frames
 Classified
with EC system when possible
2.7.7.7 – DNA-directed DNA
polymerization
 Carried out by five enzymes in E. coli
 Example:
76
SRI International Bioinformatics
Reaction Ontology
77
SRI International Bioinformatics
Where is 2.7.7.7 in the Ontology?
78
SRI International Bioinformatics
Slots of Reaction Frames
 Balance-state
 EC-number
 Enzymatic-reaction
Generated in protein or reaction editor
 In-pathway
 Generated in pathway editor
 Left and Right (reactants / products)
 Can include modified forms of proteins, RNAs, etc here
 Not all reactants/products need to be frames

79
SRI International Bioinformatics
80
SRI International Bioinformatics
Reaction relationships
81
SRI International Bioinformatics
Semantic Inference Layer
Genes-of-reaction
(rxn)
Substrates-of-reaction (rxn)
Enzymes-of-reaction (rxn)
Lacking-ec-number (organism)
 Returns list of rxns with no ec numbers in that database
Get-reaction-direction-in-pathway (pwy rxn)
Reaction-type(rxn)

Indicates types of Rxn as: Small molecule rxn, transport rxn, protein-small-molecule rxn
(one substrate is protein and one is a small molecule), protein rxn (all substrates are
proteins), etc.
All-rxns(type)
 Specify the type of reaction (see above for type)
Obtain-rxn-stats
 Returns six values


82
Length of : all-rxns, transport, non-transport, etc…
SRI International Bioinformatics
Find all small-molecule reactions that have no enzyme but are not
spontaneous (“orphan” reactions)
(defun orphan-reactions (&optional (verbose? t))
(loop for r in (all-rxns :small-molecule)
when (and (not (slot-has-value-p r 'enzymatic-reaction))
(not (get-slot-value r 'spontaneous?)))
collect r)
)
83
SRI International Bioinformatics
Reaction Direction
 Left/Right
reflect direction of reaction as written
by Enzyme Commission
 Reflects systematic direction for different reaction classes
 Left/Right do not necessarily correspond to
physiological direction of a reaction
 Get-rxn-direction(rxn)
Returns :L2R or :R2L or :BOTH or NIL
 Integrates all available info about direction of this reaction



84
Direction(s) it occurs in all pathways in the PGDB
Direction(s) as specified in Enzymatic-Reactions
SRI International Bioinformatics
RNAs
85
SRI International Bioinformatics
RNAs
 PGDBs
only represent RNAs that are “terminal
gene products”
 tRNAs
 rRNAs
 Regulatory RNAs
 Miscellaneous small RNAs
 Slots
similar to proteins
 tRNAs
86
can have an anticodon
SRI International Bioinformatics
87
SRI International Bioinformatics
The RNA Ontology
88
SRI International Bioinformatics
Pathway Tools Schema and Semantic
Inference Layer: Pathways and the
Overview
89
SRI International Bioinformatics
Outline
 Pathways
Representation of Pathways
 Querying Pathways Programmatically
 How Pathway Diagrams are Generated
 Future Work: Signalling Pathways

 Cellular
Overview Diagram
 New Functionality
 Under the Hood
 How Overview Diagram is Generated
 Using Overview Diagram for Global Queries
90
SRI International Bioinformatics
What is a Pathway?
 An
ordered set of interconnected, directed
biochemical reactions
 Reactions form a coherent unit, e.g.
 Regulated as a single unit
 Evolutionarily conserved across organisms as a single unit
 When combined, perform a single cellular function
 Historically grouped together as a unit
 Includes metabolic pathways and signalling
pathways
 Evidence for all reactions in a single organism
 Pathways can be linear, cyclical, branched, or
some combination
91
SRI International Bioinformatics
Internal Representation of Pathways
 REACTION-LIST:
unordered list of reactions that
comprise the pathway
 PREDECESSORS: list of reaction pairs that define
ordering relationships between reactions.
R1
A
R2
C
B
R3
D
(R2 R1) : Predecessor of R2 is R1
(R3 R1) : Predecessor of R3 is R1
(R1) : R1 has no predecessor (can be omitted)
92
SRI International Bioinformatics
Main vs Side Substrates

Main vs. side substrates
A
B
C
D
E
F
 Main compounds form the backbone of the pathway




93
substrates shared between connecting reactions
major inputs and outputs.
Side compounds omitted from pathway diagrams at low detail levels
Individual reactions do not necessarily have main and side compounds –
a particular substrate may be either a main or a side depending on the
pathway context.
SRI International Bioinformatics
Computing Directionality and
Mains/Sides
Our philosophy: Enable curator to specify as little as
possible. Compute as much as possible. This reduces
redundancy and potential for inconsistencies.
Example:
Reactions R1: A + B  C + D
R2: B  E
Predecessors: (R2 R1)
 Only substrate overlap is B
 B must be a main substrate
 A must be a side substrate,
 R1 must proceed from right to left
 R2 must proceed from left to right [Suzanne why?]
C+DBE
A
94
SRI International Bioinformatics
But…
Unfortunately, mains, sides and reaction directions are
sometimes ambiguous:
 At beginnings and ends of pathways
 Use heuristics to determine main/side substrates at beginnings, ends of
pathways
 Not always what the curator wants
 Substrate overlap with both sides of a reaction,
e.g. A + B  C + D
C+BE
 Solution: Additional slot PRIMARIES, should only be
populated when necessary:
PRIMARIES: (R (A B) (C)) says that for reaction R, A and B
are both main reactants, and C is a main product.
95
SRI International Bioinformatics
More Complications…



96
ENZYME-USE: a reaction may be catalyzed by multiple
enzymes, but not all the enzymes necessarily participate in
a given pathway
 Not present in the same compartment with rest of pathway enzymes
 Down-regulated or not expressed under conditions in which pathway is
active
 ENZYME-USE slot tells us which enzymes catalyze reaction in pathway, if
not all.
LAYOUT-ADVICE: helps software draw pathway correctly,
e.g. in a cyclical pathway, tells which substrate should be at
the top.
HYPOTHETICAL-REACTIONS: list of reactions in the
pathway that are considered hypothetical (i.e. no direct
experimental evidence)
SRI International Bioinformatics
Polymerization Pathways
…  X[n]
X[n+1]
X[10]
 POLYMERIZATION-LINKS:
specifies reactions that
should be connected by a polymerization link
(X R1 R1) --- REACTANT-NAME-SLOT: N-NAME
--- PRODUCT-NAME-SLOT: N+1-NAME
 CLASS-INSTANCE-LINKS:
specifies when a link
should be drawn between a substrate class and
some instance of it (necessary only if instance is
not a member of some reaction, so no
predecessor relationship can be defined)
R1 --- PRODUCT-INSTANCES: X[10]
97
SRI International Bioinformatics
Super-Pathways
 Collection
of pathways that connect to each other
via common substrates or reactions, or as part of
some larger logical unit
 Can contain both sub-pathways and additional
connecting reactions
 Can be nested arbitrarily
 REACTION-LIST: a pathway ID instead of a
reaction ID in this slot means include all reactions
from the specified pathway
 PREDECESSORS: a pathway ID instead of a tuple
in this slot means include all predecessor tuples
from the specified pathway
98
SRI International Bioinformatics
Querying Pathways Programmatically











100
See http://bioinformatics.ai.sri.com/ptools/ptools-resources.html
(all-pathways)
(base-pathways)
 Returns list of all pathways that are not super-pathways
(genes-of-pathway pwy)
(unique-genes-of-pathway pwy)
 Returns list of all genes of a pathway that are not also part of other pathways
(enzymes-of-pathway pwy)
(substrates-of-pathway pwy)
(variants-of-pathway pwy)
 Returns all pathways in the same variant class as a pathway
(get-predecessors rxn pwy), (get-successors rxn pwy)
(get-rxn-direction-in-pathway pwy rxn)
(pathway-inputs pwy), (pathway-outputs pwy)
 Returns all compounds consumed (produced) but not produced (consumed) by
pathway (ignores stoichiometry)
SRI International Bioinformatics
Example Queries
 Find
all genes involved in metabolic pathways:
(remove-duplicates
(loop for p in (all-pathways)
append (genes-of-pathway p)))
 Find
all compounds that are unique to a single
pathway:
(loop for p in (base-pathways)
append
(loop for c in (substrates-of-pathway p)
when (null (remove p (pathways-of-compound c)))
collect (list c p)))
101
SRI International Bioinformatics
Regulation
 Significant
recent expansion of regulation in
Pathway Tools
 Class
Regulation with subclasses that describe
different biochemical mechanisms of regulation
 Slots:
 Regulator
 Regulated-Entity
 Mode
 Mechanism
102
SRI International Bioinformatics
Regulation of Enzyme Activity
 Class
Regulation-of-Enzyme-Activity
 Each instance of the class describes one
regulatory interaction
 Slots:
Regulator -- usually a small molecule
 Regulated-Entity -- an Enzymatic-Reaction
 Mechanism -- One of:



103
Competitive, Uncompetitive, Noncompetitive, Irreversible, Allosteric,
Unkmech, Other
Mode -- One of: + , -
SRI International Bioinformatics
Transcription Initiation
 Class
Regulation-of-Transcription-Initiation
 Slots:
Regulator -- instance of Proteins or Complexes (a
transcription-factor)
 Regulated-Entity -- instance of Promoters or TranscriptionUnits or Genes
 Mode -- One of: + , 
104
SRI International Bioinformatics
Attenuation
 Class
Transcriptional-Attenuation
 Several subclasses depending on type of
attenuation
 Slots
common to all:
 Regulator -- Depends on subtype of attenuation
 Regulated-Entity -- instance of Terminators or Genes or
Transcription-Units
 Mode -- One of: + , -
105
SRI International Bioinformatics
Attenuation Subtypes
 Small-Molecule-Mediated-Attenuation
Regulator = A small molecule
 Leader transcript binds small molecule and determines
formation of terminator or antiterminator
 RNA-Polymerase-Modification
 Regulator = instance of Proteins or Complexes
 Regulatory protein binds to site in transcription unit and
interacts with RNA polymerase to determine termination
 RNA-Mediated-Attenuation
 Ribosome-Mediated-Attenuation
 Rho-Blocking-Antitermination
 Protein-Mediated-Attenuation

106
SRI International Bioinformatics
BioWarehouse:
A Bioinformatics Database
Warehouse
Peter D. Karp, Thomas J. Lee, Valerie Wagner
BioCyc
BioPAX
ENZYME
CMR
Genbank
GO
BioWarehous
e
Oracle (10g) or
MySQL (4.1.11)
Eco2DBase
KEGG
UniProt
Taxonomy MAGE-ML
108
SRI International Bioinformatics
Motivations
109

Hundreds of bioinformatics DBs exist

Important problems involve queries across
multiple DBs
SRI International Bioinformatics
Technical Approach





Multi-platform support: Oracle (10g) and MySQL
Schema support for multitude of bioinformatics
datatypes
Create loaders for public bioinformatics DBs
 Parse file format of the source DB
 Semantic transformations
 Insert DB contents into warehouse tables
Provide Warehouse query access mechanisms
 SQL queries via ODBC, JDBC, OAA
Operate public BioWarehouse server: publichouse
BMC Bioinformatics 7:170 2006
110
SRI International Bioinformatics
BioWarehouse Schema
111

Manages many bioinformatics datatypes
simultaneously
 Pathways, Reactions, Chemicals
 Proteins, Genes, Replicons
 Sequences, Sequence Features
 Organisms, Taxonomic relationships
 Computations (sequence matches)
 Citations, Controlled vocabularies
 Links to external databases

Each type of warehouse object implemented
through one or more relational tables (currently
43)
SRI International Bioinformatics
Warehouse Schema
112

Manages multiple datasets simultaneously
 Dataset = Single version of a database

Version comparison

Multiple software tools or experiments that
require access to different versions

Each dataset is a warehouse entity

Every warehouse object is registered in a dataset
SRI International Bioinformatics
BioWarehouse Loaders
113
Database
Loader
Language
Input
Format
Comments
BioCyc
C
BioCyc attribute-value
Pathway/Genome Databases
BioPAX
Java
BioPAX format
Protein interactions data
CMR
C
CMR column-delimited
Comprehensive Microbial Resource:
350+ microbial genomes
Eco2Dbase
Java
Relational table dumps
E. coli 2-D gel data
ENZYME
Java
ENZYME attribute-value
Enzyme Commission set of reactions
Genbank
Java
XML derived from ASN.1
Bacterial subset of Genbank
Gene Ontology
Java
OBO XML
Hierarchical controlled vocabulary
KEGG
C
KEGG format
Metabolic pathway data
MAGE-ML
Java
MAGE-ML format
Microarray gene expression data
NCBI Taxonomy
C
Taxonomy format
Organism taxonomy
UniProt
Java
UniProt XML
SWISS-PROT and TrEMBL
SRI International Bioinformatics
Acknowledgements
SRI

Funding
Michelle Green, Ron Caspi, Ingrid
Keseler, John Pick, Carol Fulcher,
Markus Krummenacker, Alex
Shearer
EcoCyc

Collaborators
Julio Collado-Vides, John Ingraham,
Ian Paulsen
MetaCyc

Collaborators
Sue Rhee, Peifen Zhang, Hartmut
Foerster, Chris Tissier
BioCyc

Collaborators
Christos Ouzounis and EBI CGG





sources:
NIH National Center for
Research Resources
NIH National Institute of
General Medical Sciences
NIH National Human Genome
Research Institute
Department of Energy
Microbial Cell Project
DARPA BioSpice
BioCyc.org
Learn more from BioCyc webinars: biocyc.org/webinar.shtml
114
SRI International Bioinformatics
Chokepoint Example
For Antibiotic Target Development
Find Strategic Essential Weak Links in Metabolism
Many Compounds have just 1 Producing and consuming
reaction



(defun chokepoint-1 ()
(remove-duplicates
(loop for cpd in (remove-if-not #'coercible-to-frame-p (all-substrates (all-rxns)))
when (= 1 (length (get-slot-values cpd 'APPEARS-IN-LEFT-SIDE-OF))
(length (get-slot-values cpd 'APPEARS-IN-RIGHT-SIDE-OF)))
collect (get-slot-value cpd 'APPEARS-IN-LEFT-SIDE-OF)
and
collect (get-slot-value cpd 'APPEARS-IN-RIGHT-SIDE-OF)
)
:test #'fequal)
)
;;; invoking the query:
(length (chokepoint-1)) ==> 348
115
SRI International Bioinformatics
Substring Search Example

Find all that genes that contain a given substring within
their common name or synonym list.
(defun find-gene-by-substring (substring)
(let (result)
(loop for g in (get-class-all-instances '|Genes|)
do
(loop for name in (get-slot-values g 'names)
when (search substring name :test #'string-equal)
do (pushnew g result)
))
result
))
116
SRI International Bioinformatics
Download