PPT - Bioinformatics Research Group at SRI International

advertisement
Computing with Pathway/Genome
Databases
1
SRI International Bioinformatics
Motivations for Understanding
Pathway Tools Schema
 When
writing complex queries to PGDBs, those
queries must refer to classes and slots within the
schema
 Queries using Lisp, Perl, Java APIs
 Queries using Query Page
 Queries using Structured Advanced Query Form
2
SRI International Bioinformatics
Motivations for Understanding
Schema
 Pathway
Tools visualizations and analyses
depend upon the software being able to find
precise information in precise places within a
Pathway/Genome DB
A
Pathway/Genome Database is a web of
interconnected objects; each object represents a
biological entity
3
SRI International Bioinformatics
Pathway Tools Implementation Details
4

Platforms:
 Macintosh, PC/Linux, and PC/Windows platforms

Same binary can run as desktop app or Web server

Production-quality software
 Version control
 Two regular releases per year
 Extensive quality assurance
 Extensive documentation
 Auto-patch
 Automatic DB-upgrade

420,000 lines of Lisp code
SRI International Bioinformatics
More Information
 Pathway


http://bioinformatics.ai.sri.com/ptools/
http://bioinformatics.ai.sri.com/ptools/examples.lisp
 PerlCyc


& JavaCyc API , includes some relationships
http://www.arabidopsis.org/tools/aracyc/perlcyc/
http://www.arabidopsis.org/tools/aracyc/javacyc/
 Pathway

Tools Web Site, Tutorial Slides
Tools User’s Guide
Appendix: Guide to the Pathway Tools Schema
 Curator's

Guide
http://bioinformatics.ai.sri.com/ptools/curatorsguide.pdf
 aic/pathway-tools/nav/12.0/lisp/relationships.lisp
8
SRI International Bioinformatics
References
 Ontology
Papers section of
http://biocyc.org/publications.shtml
 "An Evidence Ontology for use in Pathway/Genome
Databases"
9

"An ontology for biological function based on molecular
interactions"

"Representations of metabolic knowledge: Pathways"

"Representations of metabolic knowledge"
SRI International Bioinformatics
Data Exchange


10
APIs: Lisp API, Java API, and Perl API : read & modify
Cyclone

Export to files
 BioPAX Export: since Pathway Tools 9.0
Biopax.org
 Export PGDB genome to Genbank format
 Export entire PGDB as column-delimited and attribute-value file formats
 Export PGDB reactions as SBML -- sbml.org
 Import/Export of Pathways: between PGDBs
 Import/Export of Selected Frames, for Spreadsheets
 Import/Export of Compounds as Molfile, CML
 Registering/Publishing PGDBs on WWW

BioWarehouse : Loader for Flatfiles, SQL access
 http://bioinformatics.ai.sri.com/biowarehouse/
 BMC Bioinformatics 7:170 2006
SRI International Bioinformatics
Programmatic Access to BioCyc

Common LISP
•
Native language of Pathway Tools
•
Interactive & Mature Environment
•
Full Access to the Data & Many Utility Functions
•
Source code is available for academics

PerlCyc
•
API of Functions, Exposed to Perl
•
Communication through UNIX Socket

JavaCyc
•
API of Functions, Exposed to Java
•
Communication through UNIX Socket
•
11
Cyclone
SRI International Bioinformatics
Cyclone
 Developed
by Schachter and colleagues from
Genoscope
 http://nemo-cyclone.sourceforge.net/archi.php
 Cyclone
is a Java-based system that:
 Extracts data from a Pathway Tools PGDB
 Converts it to an XML schema
 Maps the data to Java objects and to a relational database
 Changes made to the data on the Java side can be
committed back to a Pathway Tools PGDB
12
SRI International Bioinformatics
Pathway Tools Data Model


13
PGDBs are object-oriented databases
Frame Representation System, named Ocelot

Frame data model
 PGDB = Knowledge base = KB = Database = DB
 Frames
 Slots

PGDBs are stored in three possible ways
•
Preloaded into binary executable
•
Ocelot file: single-user
•
RDBMS: MySQL-4 or Oracle-10 : multi-user, change-logging

Query API: GFP (Generic Frame Protocol)
SRI International Bioinformatics
Frames
Entities
with which facts are associated
Kinds


of frames:
Classes: Genes, Pathways, Biosynthetic Pathways
Instances (objects): trpA, TCA cycle
Classes:

A
14
Superclass(es), Subclass(es), Instance(s)
symbolic frame name (id, key) uniquely identifies each frame
 Examples: EG10223, TRP, Proteins
SRI International Bioinformatics
Slots
Encode
attributes and properties of a frame
Represent

15
relationships between frames
The value of a slot is the identifier of another frame
SRI International Bioinformatics
Slots

Number of values
 Single valued
 Multivalued: sets, bags

Slot values
 Any LISP object: Integer, real, string, symbol (frame name)

Every slot is described by a “slot frame” in a KB that
defines meta information about that slot
 Datatype, classes it pertains to, constraints
 Two slots are inverses if they encode opposite relationships


16
Slot Product in class Genes
Slot Gene in class Polypeptides
SRI International Bioinformatics
Pathway Tools Ontology / Schema
 Ontology
classes: 1621
 Datatype classes: Define objects from genomes to pathways
 Classification systems for pathways, chemical compounds,
enzymatic reactions (EC system)
 Protein Feature ontology
 Controlled vocabularies:


Cell Component Ontology
Evidence codes
 Comprehensive
set of 248 attributes and
relationships
17
SRI International Bioinformatics
Root Classes in the Pathway Tools
Ontology

Chemicals
Polymer-Segments
Protein-Features
Paralogous-Gene-Groups

Organisms

Generalized-Reactions
Enzymatic-Reactions
Regulation
-- Reactions and pathways
-- Link enzymes to reactions they catalyze
-- Regulatory interactions
CCO
Evidence
-- Cell Component Ontology
-- Evidence ontology
Notes
Organizations
People
Publications
-- Timestamped, person-stamped notes











18
-- All molecules
-- Regions of polymers
-- Features on proteins
SRI International Bioinformatics
Use GKB Editor to Inspect the
Pathway Tools Ontology
 GKB
Editor = Generic Knowledge Base Editor
 Type in Navigator window: (GKB)
or
 [Right-Click] Edit->Ontology Editor
 View->Browse
Class Hierarchy
 [Middle-Click] to expand hierarchy
 To view classes or instances, select them and:
 Frame -> List Frame Contents
 Frame -> Edit Frame
19
SRI International Bioinformatics
Schema Overview
20
SRI International Bioinformatics
Principal Classes

Class names are capitalized, plural, separated by dashes

Genetic-Elements, with subclasses:
 Chromosomes
 Plasmids
Genes
Transcription-Units
RNAs
 rRNAs, snRNAs, tRNAs, Charged-tRNAs
Proteins, with subclasses:
 Polypeptides
 Protein-Complexes




21
SRI International Bioinformatics
Principal Classes
 Reactions,
with subclasses:
 Transport-Reactions
 Enzymatic-Reactions
 Pathways
 Compounds-And-Elements
22
SRI International Bioinformatics
Principal Classes
 Regulation
Regulation-of-Enzyme-Activity
 Regulation-of-Transcription



23
Regulation-of-Transcription-Initiation
Transcriptional-Attenuation
SRI International Bioinformatics
Example of a Single GFP Call
The General Pattern:
gfp-function(frame-ID slot-ID value ...)
(gfp-function frame-ID slot-ID value …)

LISP
(get-slot-values 'TRYPSYN-RXN 'LEFT)
==> (INDOLE-3-GLYCEROL-P SER)

25
SRI International Bioinformatics
Architecture of the API server –
PerlCyc and JavaCyc
 Works
on Unix (Solaris or Linux) only
 Start up Pathway Tools with the –api arg
 Pathway Tools listens on a Unix socket – perl
program communicates through this socket
 Supports both querying and editing PGDBs
 Must run perl or java program on the same
machine that runs Pathway Tools
 This is a security measure, as the API server has no built-in
security
 Can only handle one connection at a time
26
SRI International Bioinformatics
Obtaining PerlCyc and JavaCyc
Download from
http://www.sgn.cornell.edu/downloads/
PerlCyc written and maintained by Lukas Mueller at
Boyce Thompson Institute for Plant Research.
JavaCyc written by Thomas Yan at Carnegie
Institute, maintained by Lukas Mueller.
Easy to extend…
27
SRI International Bioinformatics
Examples of PerlCyc, JavaCyc
Functions
GFP
functions (require knowledge of Pathway Tools
schema):
 getSlotValues
 get_slot_values
 getClassAllInstances
 get_class_all_instances
 putSlotValues
 put_slot_values
Pathway Tools functions (described at
http://bioinformatics.ai.sri.com/ptools/ptools-fns.html):
 genes_of_reaction
 genesOfReaction
 find_indexed_frame
 findIndexedFrame
 pathways_of_gene
 pathwaysOfGene
 transport_p
 transportP
28
SRI International Bioinformatics
Writing a PerlCyc or JavaCyc program



Create a PerlCyc, JavaCyc object:
perlcyc -> new (“ORGID”)
new Javacyc (“ORGID”)
Call PerlCyc, JavaCyc functions on this object:
my $cyc = perlcyc -> new (“ECOLI”);
my @pathways = $cyc -> all_pathways ();
Javacyc cyc = new Javacyc(“ECOLI”);
ArrayList pathways = cyc.allPathways ();
Functions return object IDs, not objects.
 Must connect to server again to retrieve attributes of an object.
foreach my $p (@pathways) {
print $cyc -> get_slot_value ($p, “COMMON-NAME”);}
for (int i=0; I < pathways.size(); i++) {
String pwy = (String) pathways.get(i);
System.out.println (cyc.getSlotValue (pwy, “COMMON-NAME”); }
29
SRI International Bioinformatics
Sample PerlCyc Query
 Number
of proteins in E. coli
use perlcyc;
my $cyc = perlcyc -> new (“ECOLI”);
my @proteins = $cyc->
get_class_all_instances("|Proteins|");
my $protein_count = scalar(@proteins);
print "Protein count: $protein_count.\n";
30
SRI International Bioinformatics
Sample PerlCyc Query
 Print
IDs of all proteins with molecular weight
between 10 and 20 kD and pI between 4 and 5.
use perlcyc;
my $cyc = perlcyc -> new (“ECOLI”);
foreach my $p ($cyc->get_class_all_instances("|Proteins|")) {
my $mw = $cyc->get_slot_value($p, "molecular-weight-kd");
my $pI = $cyc->get_slot_value($p, "pi");
if ($mw <= 20 && $mw >= 10 && $pI <= 5 && $pI >= 4) {
print "$p\n";
}
}
31
SRI International Bioinformatics
Sample PerlCyc Query
 List
all the transcription factors in E. coli, and the
list of genes that each regulates:
use perlcyc;
my $cyc = perlcyc -> new (“ECOLI”);
foreach my $p ($cyc->get_class_all_instances("|Proteins|")) {
if ($cyc->transcription_factor_p($p)) {
my $name = $cyc->get_slot_value($p, "common-name");
my %genes = ();
foreach my $tu ($cyc->regulon_of_protein($p)) {
foreach my $g ($cyc->transcription_unit_genes($tu)) {
$genes{$g} = $cyc->get_slot_value($g, "common-name");
}
}
print "\n\n$name: ";
print join " ", values %genes;
}
}
32
SRI International Bioinformatics
Sample Editing Using PerlCyc
 Add
a link from each gene to the corresponding
object in MY-DB (assume ID is same in both
cases)
use perlcyc;
my $cyc = perlcyc -> new (“HPY”);
my @genes = $cyc->get_class_all_instances (“|Genes|”);
foreach my $g (@genes) {
$cyc->add_slot_value ($g, “DBLINKS”, “(MY-DB \”$g\”)”);
}
$cyc->save_kb();
33
SRI International Bioinformatics
Sample JavaCyc Query:
Enzymes for which ATP is a regulator
import java.util.*;
public class JavacycSample {
public static void main(String[] args) {
Javacyc cyc = new Javacyc("ECOLI");
ArrayList regframes =
cyc.getClassAllInstances("|Regulation-of-Enzyme-Activity|");
for (int i = 0; i < regframes.size(); i++) {
String reg = (String)regframes.get(i);
boolean bool = cyc.memberSlotValueP(reg, “Regulator", "ATP");
if (bool) {
String enzrxn = cyc.getSlotValue (reg, “Regulated-Entity”);
String enzyme = cyc.getSlotValue (enzrxn, “Enzyme”);
System.out.println(enz); } } } }
34
SRI International Bioinformatics
Simple Lisp Query Example:
Enzymes for which ATP is a regulator
(defun atp-inhibits ()
(loop for x in (get-class-all-instances '|Regulation-of-Enzyme-Activity|)
;; Does the Regulator slot contain the compound ATP, and the mode
;; of regulation is negative (inhibition)?
when (and (member-slot-value-p x ‘Regulator 'ATP)
(member-slot-value-p x ‘Mode “-”) )
;; Whenever the test is positive, we collect the value of the slot Enzyme
;; of the Regulated-Entity of the regulatory interaction frame.
;; The collected values are returned as a list, once the loop terminates.
collect (get-slot-value (get-slot-value x ‘Regulated-Entity) ‘Enzyme) )
)
;;; invoking the query:
(select-organism :org-id 'ECOLI)
(atp-inhibits)
(get-slot-values 'TRYPSYN-RXN 'LEFT)
==> (INDOLE-3-GLYCEROL-P SER)
35
SRI International Bioinformatics
Simple Perl Query Example:
Enzymes for which ATP is a regulator
use perlcyc;
my $cyc = perlcyc -> new("ECOLI");
my @regs = $cyc -> get_class_all_instances("|Regulation-of-EnzymeActivity|");
## We check every instance of the class
foreach my $reg (@regs) {
## We test for whether the INHIBITORS-ALL
## slot contains the compound frame ATP
my $bool1 = $cyc -> member_slot_value_p($reg, “Regulator", "Atp");
my $bool2 = $cyc -> member_slot_value_p($reg, “Mode", “-");
if ($bool1 && $bool2) {
## Whenever the test is positive, we collect the value of the slot
ENZYME .
## The results are printed in the terminal.
my $enzrxn = $cyc -> get_slot_value($reg, “Regulated-Entity");
my $enz = $cyc -> get_slot_value($enzrxn, "Enzyme");
print STDOUT "$enz\n";
}
}
36
SRI International Bioinformatics
Getting started with Lisp





pathway-tools –lisp
(load “file”) (compile-file “file.lisp”)
Emacs is a useful editor
Pathway Tools source code is available: ask
Lisp resources:
http://bioinformatics.ai.sri.com/ptools/ptools-resources.html
37
SRI International Bioinformatics
Query Gotchas
 Study
schema carefully
 :test #’fequal
 Cascade of slot-values: check for NIL
38
SRI International Bioinformatics
Semantic Inference Layer
relationships.lisp

Library of functions that encapsulate common query
building blocks and intricacies of navigating the schema

enzymes-of-gene
reactions-of-gene
pathways-of-gene
genes-of-pathway
pathway-hole-p
reactions-of-compound
top-containers(protein)
all-rxns(type) (:metab-smm :metab-all :metab-pathways :enzyme :transport







etc.)

39
(all-rxns :metab-pathways)
SRI International Bioinformatics
Pathway Tools Schema and
Semantic Inference Layer
Genes, Operons, and Replicons
40
SRI International Bioinformatics
Representing a Genome
components
genome
ORG
41
Gene1
CHROM1
Gene2
CHROM2
Gene3
PLASMID1

product
Classes:
 ORG is of class Organisms
 CHROM1 is of class Chromosomes
 PLASMID1 is of class Plasmids
 Gene1 is of class Genes
 Product1 is of class Polypeptides or RNA
SRI International Bioinformatics
Product1

42
(defun genes-of-chrom (chrom)
(loop for x in (get-slot-values chrom ‘components)
when (instance-all-instance-of-p x ‘|Genes|)
collect x)
)
SRI International Bioinformatics
Polynucleotides
Review slots of COLI and of COLI-K12
43
SRI International Bioinformatics
Genetic-Elements
 Sequence
is stored in a separate file or database
table
44
SRI International Bioinformatics
Polymer-Segments
Review slots of Genes
45
SRI International Bioinformatics
Complexities of Gene / Gene-Product
Relationships




The Product of a gene can be an instance of Polypeptides
or RNAs
An instance of Polypeptides can have more than one gene
encoding it
Sequence position:
 Nucleotide positions of starting and ending codons specified in Left-EndPosition and Right-End-Position (usually greater, except at origin)
 Transcription-Direction + / Alternative splicing:
 Nucleotide positions of starting and ending codons specified in Left-EndPosition and Right-End-Position
 Intron positions specified in Splice-Form-Introns of gene product

46
(200 300) (350 400)
SRI International Bioinformatics
Gene Reaction Schematic
47
SRI International Bioinformatics
Substring Search Example

Find all genes that contain a given substring within their
common name or synonym list.
(defun find-gene-by-substring (substring)
(let (result)
(loop for g in (get-class-all-instances '|Genes|)
do
(loop for name in (get-slot-values g 'names)
when (search substring name :test #'string-equal)
do (pushnew g result)
))
result
))
48
SRI International Bioinformatics
Proteins
49
SRI International Bioinformatics
Proteins and Protein Complexes
 Polypeptide:
the monomer protein product of a
gene (may have multiple isoforms, as indicated at
gene level)
 Protein
complex: proteins consisting of multiple
polypeptides or protein complexes
 Example:
DNA pol III
 DnaE is a polypeptide
 pol III core is DnaE and two other polypeptides
 pol III holoenzymes is several protein complexes combined
50
SRI International Bioinformatics
Protein Complex Relationships
51
SRI International Bioinformatics
Slots of a protein (DnaE)
 catalyzes
 Is
it a regulator/reactant/etc?
 comment
 component-of
 dblinks
 features (edited in feature editor)
 Many
52
other attributes possible
SRI International Bioinformatics
A complex at the frame level (pol III)
 Most
of the same attributes as polypeptide frame
 component-of
and components
 note coefficients
53
SRI International Bioinformatics
Protein Complex Relationships
54
SRI International Bioinformatics
Relationships are Defined in Many
Places
 component-of
comes from creating a complex
 appears-in-left-side-of
comes from defining a
reaction (as do modified forms)
 regulates
comes from an enzymatic reaction or
TU
 can
only edit dna-footprint if protein has been
associated with a TU
55
SRI International Bioinformatics
Semantic Inference Layer
 Reactions-of-protein
(prot)
 Returns a list of rxns this protein catalyzes
 Transcription-units-of-proteins(prot)
 Returns a list of TU’s activated/inhibited by the given protein
 Transporter? (prot)
 Is this protein a transporter?
 Polypeptide-or-homomultimer?(prot)
 Transcription-factor? (prot)
 Obtain-protein-stats
 Returns 5 values

56
Length of : all-polypeptides, complexes, transporters, enzymes, etc…
SRI International Bioinformatics
Example
 Find
all enzymes that use pyridoxal phosphate as
a cofactor or prosthetic group

(loop for protein in (get-class-all-instances ‘|Proteins|)
for enzrxn = (get-slot-value protein ‘enzymatic-reaction)
when (and enzrxn
(or (member-slot-value-p enzrxn ‘cofactors ‘pyridoxal_phosphate)
(member-slot-value-p enzrxn ‘prosthetic-groups
‘pyridoxal_phosphate))
collect protein)
(member-slot-value-p frame slot value) : T if Value is one of the values of
Slot of Frame.
57
SRI International Bioinformatics
Sample
 Find
all proteins without
a comment anywhere
58
SRI International Bioinformatics
RNAs
59
SRI International Bioinformatics
RNAs
 PGDBs
only represent RNAs that are “terminal
gene products”
 tRNAs
 rRNAs
 Regulatory RNAs
 Miscellaneous small RNAs
 Slots
similar to proteins
 tRNAs
60
can have an anticodon
SRI International Bioinformatics
61
SRI International Bioinformatics
The RNA Ontology
62
SRI International Bioinformatics
Compounds / Reactions / Pathways
63
SRI International Bioinformatics
Compounds / Reactions / Pathways
 Think
of a three tiered structure:
 Reactions built on top of compounds
 Pathways built on top of reactions
 Metabolic network defined by reactions alone;
pathways are an additional “optional” structure
 Some reactions not part of a pathway
 Some reactions have no attached enzyme
 Some enzymes have no attached gene
64
SRI International Bioinformatics
Compounds
 Relatively
few aspects of a compound defined
within the compound editor
 MW, formula calculated from edited structure
 Most
aspects defined in other editors
 “Pathway reactions” comes from reaction editing followed by
pathway editing
 Activator, etc come from the protein editor
65
SRI International Bioinformatics
66
SRI International Bioinformatics
(print-frame ‘TRP)
-- Instance TRP --Types: |Amino-Acid|, |Aromatic-Amino-Acids|, |Non-polar-amino-acids|
APPEARS-IN-LEFT-SIDE-OF: RXN0-287, TRANS-RXN-76, TRYPTOPHAN-RXN,
TRYPTOPHAN--TRNA-LIGASE-RXN
APPEARS-IN-RIGHT-SIDE-OF: RXN0-2382, RXN0-301, TRANS-RXN-76, TRYPSYN-RXN
CHEMICAL-FORMULA: (C 11), (H 12), (N 2), (O 2)
COMMON-NAME: "L-tryptophan"
DBLINKS: (LIGAND-CPD "C00078" NIL |kaipa| 3311532640 NIL NIL),
(CAS "6912-86-3"), (CAS "73-22-3")
NAMES: "L-tryptophan", "W", "tryptacin", "trofan", "trp", "tryptophan",
"2-amino-3-indolylpropanic acid"
SMILES: "c1(c(CC(N)C(=O)O)c2(c([nH]1)cccc2))"
SYNONYMS: "W", "tryptacin", "trofan", "trp", "tryptophan",
"2-amino-3-indolylpropanic acid"
____________________________________________
67
SRI International Bioinformatics
Semantic Inference Layer
 Reactions-of-compound
(cpd)
 Pathways-of-compound (cpd)
 Is-substrate-an-autocatalytic-enzyme-p (cpd)
 Activated/inhibited-by? (cpds slots)
 Returns a list of enzrxns for which a cpd in cpds is a
modulator (example slots: activators-all, activators-allosteric)
 All-substrates (rxns)
 All unique substrates specified in the given rxns
 Has-structure-p (cpd)
 Obtain-cpd-stats
 Returns two values:

69
Length of :all-cpds, cpds with structures
SRI International Bioinformatics
Queries with Multiple Answers

Navigator queries:
 Example: Substring search for “pyruvate”
 Selected list is placed on the Answer list
 Use “Next Answer” button to view each one of them

Lisp queries:
Example : Find reactions involving pyruvate as a substrate

(get-class-all-instances ‘|Compounds|)
(loop
for rxn in (get-class-all-instances ‘|Reactions|)
when (member ‘pyruvate (get-slot-values rxn ‘substrates)
collect rxn)
(replace-answer-list * )
72
SRI International Bioinformatics
Reactions
73
SRI International Bioinformatics
Reactions
 Represents
information about a reaction that is
independent of enzymes that catalyze the reaction
 Connected
to enzyme(s) via enzymatic reaction
frames
 Classified
with EC system when possible
2.7.7.7 – DNA-directed DNA
polymerization
 Carried out by five enzymes in E. coli
 Example:
74
SRI International Bioinformatics
Reaction Ontology
75
SRI International Bioinformatics
Where is 2.7.7.7 in the Ontology?
76
SRI International Bioinformatics
Slots of Reaction Frames
 Balance-state
 EC-number
 Enzymatic-reaction
Generated in protein or reaction editor
 In-pathway
 Generated in pathway editor
 Left and Right (reactants / products)
 Can include modified forms of proteins, RNAs, etc here
 Not all reactants/products need to be frames

77
SRI International Bioinformatics
78
SRI International Bioinformatics
Enzymatic Reactions (DnaE and
2.7.7.7)
A
necessary bridge between enzymes and
“generic” versions of reactions
 Carries information specific to an
enzyme/reaction combination:
 Cofactors and prosthetic groups
 Alternative substrates
 Links to regulatory interactions
 Frame
is generated when protein is associated
with reaction (via protein or reaction editor)
80
SRI International Bioinformatics
81
SRI International Bioinformatics
Regulation of Enzyme Activity
82
SRI International Bioinformatics
Semantic Inference Layer
Genes-of-reaction
(rxn)
Substrates-of-reaction (rxn)
Enzymes-of-reaction (rxn)
Lacking-ec-number (organism)
 Returns list of rxns with no ec numbers in that database
Get-reaction-direction-in-pathway (pwy rxn)
Reaction-type(rxn)

Indicates types of Rxn as: Small molecule rxn, transport rxn, protein-small-molecule rxn
(one substrate is protein and one is a small molecule), protein rxn (all substrates are
proteins), etc.
All-rxns(type)
 Specify the type of reaction (see above for type)
Obtain-rxn-stats
 Returns six values


83
Length of : all-rxns, transport, non-transport, etc…
SRI International Bioinformatics
Find all small-molecule reactions that have no enzyme but are not
spontaneous (“orphan” reactions)
(defun orphan-reactions (&optional (verbose? t))
(loop for r in (all-rxns :small-molecule)
when (and (not (slot-has-value-p r 'enzymatic-reaction))
(not (get-slot-value r 'spontaneous?)))
collect r)
)
84
SRI International Bioinformatics
Reaction Direction
 Left/Right
reflect direction of reaction as written
by Enzyme Commission
 Reflects systematic direction for different reaction classes
 Left/Right do not necessarily correspond to
physiological direction of a reaction
 Get-rxn-direction(rxn)
Returns :L2R or :R2L or :BOTH or NIL
 Integrates all available info about direction of this reaction



85
Direction(s) it occurs in all pathways in the PGDB
Direction(s) as specified in Enzymatic-Reactions
SRI International Bioinformatics
Pathways
86
SRI International Bioinformatics
What is a Pathway?
 An
ordered set of interconnected, directed
biochemical reactions
 Reactions form a coherent unit, e.g.
 Regulated as a single unit
 Evolutionarily conserved across organisms as a single unit
 When combined, perform a single cellular function
 Historically grouped together as a unit
 Includes metabolic pathways and signalling
pathways
 Evidence for all reactions in a single organism
 Pathways can be linear, cyclical, branched, or
some combination
88
SRI International Bioinformatics
Internal Representation of Pathways
 REACTION-LIST:
unordered list of reactions that
comprise the pathway
 PREDECESSORS: list of reaction pairs that define
ordering relationships between reactions.
E.g.
R1
R2
C
A
B
R3
D
(R2 R1) : Predecessor of R2 is R1
(R3 R1) : Predecessor of R3 is R1
(R1) : R1 has no predecessor (can be omitted)
89
SRI International Bioinformatics
What is missing from Pathway
Representation?


Reaction directions
 Some reactions are unidirectional, but many are reversible – how do we
know in which direction to draw the reaction?
Main vs. side substrates
A
B
C
D
E
F
 Main compounds form the backbone of the pathway




90
substrates shared between connecting reactions
major inputs and outputs.
Side compounds omitted from pathway diagrams at low detail levels
Individual reactions do not necessarily have main and side compounds –
a particular substrate may be either a main or a side depending on the
pathway context.
SRI International Bioinformatics
Computing Directionality and
Mains/Sides
Our philosophy: Enable curator to specify as little as
possible. Compute as much as possible. This reduces
redundancy and potential for inconsistencies.
Example:
Reactions R1: A + B  C + D
R2: B  E
Predecessors: (R2 R1)
 Only substrate overlap is B
 B must be a main substrate
 A must be a side substrate,
 R1 must proceed from right to left
 R2 must proceed from left to right
C+DBE
A
91
SRI International Bioinformatics
But…
Unfortunately, mains, sides and reaction directions are
sometimes ambiguous:
 At beginnings and ends of pathways
 Use heuristics to determine main/side substrates at beginnings, ends of
pathways
 Not always what the curator wants
 Substrate overlap with both sides of a reaction,
e.g. A + B  C + D
C+BE
 Solution: Additional slot PRIMARIES, should only be
populated when necessary:
PRIMARIES: (R (A B) (C)) says that for reaction R, A and B
are both main reactants, and C is a main product.
92
SRI International Bioinformatics
More Complications…



93
ENZYME-USE: a reaction may be catalyzed by multiple
enzymes, but not all the enzymes necessarily participate in
a given pathway
 Not present in the same compartment with rest of pathway enzymes
 Down-regulated or not expressed under conditions in which pathway is
active
 ENZYME-USE slot tells us which enzymes catalyze reaction in pathway, if
not all.
LAYOUT-ADVICE: helps software draw pathway correctly,
e.g. in a cyclical pathway, tells which substrate should be at
the top.
HYPOTHETICAL-REACTIONS: list of reactions in the
pathway that are considered hypothetical (i.e. no direct
experimental evidence)
SRI International Bioinformatics
Polymerization Pathways
…  X[n]
X[n+1]
X[10]
 POLYMERIZATION-LINKS:
specifies reactions that
should be connected by a polymerization link
(X R1 R1) --- REACTANT-NAME-SLOT: N-NAME
--- PRODUCT-NAME-SLOT: N+1-NAME
 CLASS-INSTANCE-LINKS:
specifies when a link
should be drawn between a substrate class and
some instance of it (necessary only if instance is
not a member of some reaction, so no
predecessor relationship can be defined)
R1 --- PRODUCT-INSTANCES: X[10]
94
SRI International Bioinformatics
Super-Pathways
 Collection
of pathways that connect to each other
via common substrates or reactions, or as part of
some larger logical unit
 Can contain both sub-pathways and additional
connecting reactions
 Can be nested arbitrarily
 REACTION-LIST: a pathway ID instead of a
reaction ID in this slot means include all reactions
from the specified pathway
 PREDECESSORS: a pathway ID instead of a tuple
in this slot means include all predecessor tuples
from the specified pathway
95
SRI International Bioinformatics
Querying Pathways Programmatically











97
See http://bioinformatics.ai.sri.com/ptools/ptools-resources.html
(all-pathways)
(base-pathways)
 Returns list of all pathways that are not super-pathways
(genes-of-pathway pwy)
(unique-genes-of-pathway pwy)
 Returns list of all genes of a pathway that are not also part of other pathways
(enzymes-of-pathway pwy)
(substrates-of-pathway pwy)
(variants-of-pathway pwy)
 Returns all pathways in the same variant class as a pathway
(get-predecessors rxn pwy), (get-successors rxn pwy)
(get-rxn-direction-in-pathway pwy rxn)
(pathway-inputs pwy), (pathway-outputs pwy)
 Returns all compounds consumed (produced) but not produced (consumed) by
pathway (ignores stoichiometry)
SRI International Bioinformatics
Example Queries
 Find
all genes involved in metabolic pathways:
(remove-duplicates
(loop for p in (all-pathways)
append (genes-of-pathway p)))
 Find
all compounds that are unique to a single
pathway:
(loop for p in (base-pathways)
append
(loop for c in (substrates-of-pathway p)
when (null (remove p (pathways-of-compound c)))
collect (list c p)))
98
SRI International Bioinformatics
Regulation
99
SRI International Bioinformatics
Regulation
 Reorganization
and expansion of regulation under
way in Pathway Tools
 Initial application to EcoCyc
 Class
Regulation with subclasses that describe
different biochemical mechanisms of regulation
 Slots:
 Regulator
 Regulated-Entity
 Mode
 Mechanism
100
SRI International Bioinformatics
Regulation of Enzyme Activity
 Class
Regulation-of-Enzyme-Activity
 Each instance of the class describes one
regulatory interaction
 Slots:
Regulator -- usually a small molecule
 Regulated-Entity -- an Enzymatic-Reaction
 Mechanism -- One of:


Competitive, Uncompetitive, Noncompetitive, Irreversible, Allosteric,
Other
Mode -- One of: + ,  Physiologically-relevant? – true/false

101
SRI International Bioinformatics
Transcription Initiation
 Class
Regulation-of-Transcription-Initiation
 Transcription factor binds to DNA binding site to regulate
transcription initiation from a promoter
 Slots:
Regulator -- instance of Proteins or Complexes (a
transcription-factor)
 Regulated-Entity -- instance of Promoters
 Mode -- One of: + ,  Associated-binding-site – a DNA-Binding-Site

102
SRI International Bioinformatics
Attenuation
 Class
Transcriptional-Attenuation
 Several subclasses depending on type of
attenuation
 Slots
common to all:
 Regulator -- Depends on subtype of attenuation
 Regulated-Entity -- instance of Terminators
 Mode -- One of: + , -
103
SRI International Bioinformatics
Attenuation Subtypes






104
Ribosome-Mediated-Attenuation
 E.g. trp operon – ribosome pauses based on levels of charged tRNA,
determines formation of terminator or antiterminator
RNA-Mediated-Attenuation
 RNA (tRNA or sRNA) binds to transcript, determines formation of
terminator or antiterminator
Protein-Mediated-Attenuation
 Protein binds to transcript, determines formation of terminator or
antiterminator
Small-Molecule-Mediated-Attenuation
 Small molecule binds to transcript, determines formation of terminator or
antiterminator
Rho-Blocking-Antitermination
RNA-Polymerase-Modification
 Regulatory protein binds to site in transcription unit and interacts with
RNA polymerase to determine termination
SRI International Bioinformatics
Transcriptional Regulation
trp
apoTrpR
site001
rxn001
reg001
TrpR*trp
pro001
trpLEDCBA
trpL
trpE
trpD
trpC
trpB
trpA
term001
105
SRI International Bioinformatics
reg002
charged-tRNA*trp
Data Exchange
106
SRI International Bioinformatics
Data Exchange

Java API and Perl API : read & modify

BioPAX Export: since Pathway Tools 9.0
 Biopax.org

Export of entire PGDB as Flatfiles
Export of Reactions as SBML -- sbml.org
Import/Export of Pathways: between PGDBs
Import/Export of Selected Frames, for Spreadsheets
Import/Export of Compounds as Molfile, CML
Registering/Publishing PGDBs on WWW
Export PGDB as Genbank







107
BioWarehouse : Loader for Flatfiles, SQL access
 http://bioinformatics.ai.sri.com/biowarehouse/
SRI International Bioinformatics
Dump PGDB into Flatfiles



108
Export of entire PGDB as Flatfiles
Format Description: UG v.I section 4.5
 Column delimited: 1 line per frame
 Attribute-value: 1 record per frame
Multiple slot values:
 Column delimited: several values per column
 Attribute-value: several lines for several values
SRI International Bioinformatics
Frame Import/Export




109
Import/Export of Selected Frames, for Spreadsheets
Frame selection, Slot selection GUI
Format Description: UG v.I section 4.6.3
 Column delimited: 1 line per frame
 Attribute-value: 1 record per frame
Multiple slot values:
 Column delimited: several values per column
 Attribute-value: several lines for several values
SRI International Bioinformatics
Download