schema-api-part2 - Bioinformatics Research Group at SRI

advertisement

Reactions

1 SRI International Bioinformatics

4

Reactions

 Represents information about a reaction that is independent of enzymes that catalyze the reaction

 Connected to enzyme(s) via enzymatic reaction frames

 Classified with EC system when possible

 Example: 2.7.7.7 – DNA-directed DNA polymerization

Carried out by five enzymes in E. coli

SRI International Bioinformatics

5

Enzymatic Reactions (DnaE and

2.7.7.7)

 A necessary bridge between enzymes and

“generic” versions of reactions

 Carries information specific to an enzyme/reaction combination:

Cofactors and prosthetic groups

Alternative substrates

Links to regulatory interactions

Kinetic data (Km, Vmax, etc.)

 Frame is generated when protein is associated with reaction (via protein or reaction editor)

SRI International Bioinformatics

Reaction Ontology

6 SRI International Bioinformatics

Where is 2.7.7.7 in the Ontology?

7 SRI International Bioinformatics

8

Reaction Direction

 Left/Right reflect direction of reaction as written by Enzyme Commission

Reflects systematic direction for different reaction classes

 Left/Right do not necessarily correspond to physiological direction of a reaction

 Get-rxn-direction(rxn)

Returns :L2R or :R2L or :BOTH or NIL

Integrates all available info about direction of this reaction

Direction explicitly curated for reaction

Direction(s) it occurs in all pathways in the PGDB

Direction(s) as specified in Enzymatic-Reactions

SRI International Bioinformatics

9

Slots of Reaction Frames

 Balance-state

 EC-number

 Enzymatic-reaction

Generated in protein or reaction editor

 In-pathway

Generated in pathway editor

 Left and Right

 Reaction-Direction

:left-to-right, :right-to-left, :reversible

 Spontaneous?

SRI International Bioinformatics

Slots of Enzymatic-Reaction Frames

 Enzyme

 Reaction

 Regulated-By

 Cofactors

 Kinetic slots:

Vmax, Km, Kcat, Specific-Activity, Temperature-Opt, pH-Opt

 Reaction-Direction

Reaction may be reversible, but this enzyme effectively only catalyzes it in one direction.

10 SRI International Bioinformatics

11 SRI International Bioinformatics

Reaction relationships

12 SRI International Bioinformatics

13

Semantic Inference Layer

 Genes-of-reaction (rxn)

 Substrates-of-reaction (rxn)

 Enzymes-of-reaction (rxn)

 Lacking-ec-number (organism)

Returns list of rxns with no ec numbers in that database

 Get-reaction-direction-in-pathway (pwy rxn)

 Reaction-type(rxn)

Indicates types of Rxn as: Small molecule rxn, transport rxn, protein-small-molecule rxn

(one substrate is protein and one is a small molecule), protein rxn (all substrates are proteins), etc.

 All-rxns(type)

Specify the type of reaction (see above for type)

 Obtain-rxn-stats

Returns six values

Length of : all-rxns, transport, non-transport, etc…

SRI International Bioinformatics

Exercises

 Find all small-molecule reactions that have no enzyme but are not spontaneous (“orphan” reactions)

 Find all reactions that consume a given compound

14 SRI International Bioinformatics

Solutions to Exercises

 Find all small-molecule reactions that have no enzyme but are not spontaneous (“orphan” reactions):

(loop for r in (all-rxns :small-molecule) when (and (not (slot-has-value-p r 'enzymatic-reaction)

(not (get-slot-value r 'spontaneous?))) collect r)

15 SRI International Bioinformatics

16

Solutions to Exercises, cont.

 Find all reactions that consume a given compound:

(defun rxns-consuming-cpd (cpd)

(append

(loop for r in (get-slot-values cpd ‘appears-in-left-side-of) for dir = (get-rxn-direction r) if (or (eq dir :l2r) (eq dir :both)) collect r)

(loop for r in (get-slot-values cpd ‘appears-in-right-side-of) for dir = (get-rxn-direction r) if (or (eq dir :r2l) (eq dir :both)) collect r)))

SRI International Bioinformatics

RNAs

17 SRI International Bioinformatics

18

RNAs

PGDBs only represent RNAs that are “terminal gene products”

 tRNAs

 rRNAs

Regulatory RNAs

Miscellaneous small RNAs

 Slots similar to proteins

 tRNAs can have an anticodon

SRI International Bioinformatics

The RNA Ontology

20 SRI International Bioinformatics

Pathway Tools Schema and Semantic

Inference Layer: Pathways

21 SRI International Bioinformatics

22

What is a Pathway?

 An ordered set of interconnected, directed biochemical reactions

 Reactions form a coherent unit, e.g.

Regulated as a single unit

Evolutionarily conserved across organisms as a single unit

When combined, perform a single cellular function

Historically grouped together as a unit

 Includes metabolic pathways and signaling pathways

 Evidence for all reactions in a single organism

 Pathways can be linear, cyclical, branched, or some combination

SRI International Bioinformatics

23

Internal Representation of Metabolic

Pathways

 REACTION-LIST: unordered list of reactions that comprise the pathway

 PREDECESSORS: list of reaction pairs that define ordering relationships between reactions.

E.g. R1 R2 C

A B

R3 D

(R2 R1) : Predecessor of R2 is R1

(R3 R1) : Predecessor of R3 is R1

(R1) : R1 has no predecessor (can be omitted)

SRI International Bioinformatics

24

What is missing from Pathway

Representation?

 Reaction directions

Some reactions are unidirectional, but many are reversible – how do we know in which direction to draw the reaction?

 Main vs. side substrates

A B C

D E F

Main compounds form the backbone of the pathway

 substrates shared between connecting reactions major inputs and outputs.

Side compounds omitted from pathway diagrams at low detail levels

Individual reactions do not necessarily have main and side compounds – a particular substrate may be either a main or a side depending on the pathway context.

SRI International Bioinformatics

25

Computing Directionality and

Mains/Sides

Our philosophy: Enable curator to specify as little as possible. Compute as much as possible. This reduces redundancy and potential for inconsistencies.

Example:

Reactions R1: A + B  C + D

R2: B  E

Predecessors: (R2 R1)

 Only substrate overlap is B

 B must be a main substrate

 A must be a side substrate,

 R1 must proceed from right to left

 R2 must proceed from left to right

C + D  B  E

A

SRI International Bioinformatics

26

But…

Unfortunately, mains, sides and reaction directions are sometimes ambiguous:

 At beginnings and ends of pathways

Use heuristics to determine main/side substrates at beginnings, ends of pathways

Not always what the curator wants

 Substrate overlap with both sides of a reaction, e.g. A + B  C + D

C + B  E

 Solution: Additional slot PRIMARIES, should only be populated when necessary:

PRIMARIES: (R (A B) (C)) says that for reaction R, A and B are both main reactants, and C is a main product.

SRI International Bioinformatics

27

More Complications…

 ENZYMES-NOT-USED: a reaction may be catalyzed by multiple enzymes, but not all the enzymes necessarily participate in a given pathway

Not present in the same compartment with rest of pathway enzymes

Down-regulated or not expressed under conditions in which pathway is active

ENZYMES-NOT-USED slot lists enzymes that are not involved in the pathway even though they catalyze one of its reactions.

 LAYOUT-ADVICE: helps software draw pathway correctly, e.g. in a cyclical pathway, tells which substrate should be at the top.

 HYPOTHETICAL-REACTIONS: list of reactions in the pathway that are considered hypothetical (i.e. no direct experimental evidence)

SRI International Bioinformatics

Polymerization Pathways

28

…  X [n] X [n+1] X [10]

 POLYMERIZATION-LINKS: specifies reactions that should be connected by a polymerization link

(X R1 R1) --- REACTANT-NAME-SLOT: N-NAME

--- PRODUCT-NAME-SLOT: N+1-NAME

 CLASS-INSTANCE-LINKS: specifies when a link should be drawn between a substrate class and some instance of it (necessary only if instance is not a member of some reaction, so no predecessor relationship can be defined)

R1 --- PRODUCT-INSTANCES: X [10]

SRI International Bioinformatics

Pathway Links

 Can be used as an alternative or in addition to defining super-pathways

 Link must be to or from some main substrate in the pathway

 Other end of link can be a pathway, a reaction, or an arbitrary text string

 Software automatically computes direction of link, but curator can override it

29 SRI International Bioinformatics

30

Super-Pathways

 Collection of pathways that connect to each other via common substrates or reactions, or as part of some larger logical unit

 Can contain both sub-pathways and additional connecting reactions

 Can be nested arbitrarily

 REACTION-LIST: a pathway ID instead of a reaction ID in this slot means include all reactions from the specified pathway

 PREDECESSORS: a pathway ID instead of a tuple in this slot means include all predecessor tuples from the specified pathway

SRI International Bioinformatics

Signaling Pathways

 Signaling pathways have different layout conventions than metabolic pathways

 Layout is done manually by curator using specialized editor

Curator has more control

Lack of automatic layout algorithm means that diagram won’t update automatically when data changes.

31 SRI International Bioinformatics

32 SRI International Bioinformatics

33

Querying Pathways Programmatically

See http://bioinformatics.ai.sri.com/ptools/ptools-resources.html

(all-pathways)

(base-pathways)

Returns list of all pathways that are not super-pathways

(genes-of-pathway pwy)

(unique-genes-of-pathway pwy)

Returns list of all genes of a pathway that are not also part of other pathways

(enzymes-of-pathway pwy)

(compounds-of-pathway pwy)

(get-reaction-list pwy)

(variants-of-pathway pwy)

Returns all pathways in the same variant class as a pathway

(get-predecessors rxn pwy), (get-successors rxn pwy)

(get-rxn-direction-in-pathway pwy rxn)

(pathway-inputs pwy), (pathway-outputs pwy)

Returns all compounds consumed (produced) but not produced (consumed) by pathway (ignores stoichiometry)

SRI International Bioinformatics

Exercises

 Find all genes involved in metabolic pathways

 Find all compounds that are unique to a single pathway

 Find the reactions of a pathway that have multiple isozymes

34 SRI International Bioinformatics

35

Solutions to Exercises

 Find all genes involved in metabolic pathways:

(remove-duplicates

(loop for p in (all-pathways) append (genes-of-pathway p)))

 Find all compounds that are unique to a single pathway:

(loop for p in (base-pathways) append (loop for c in (compounds-of-pathway p) when (null (remove p (pathways-of-compound c))) collect (list c p)))

SRI International Bioinformatics

Solutions to Exercises, cont.

 Find the reactions of a pathway that have multiple isozymes:

(defun rxns-w-multiple-isozymes (pwy)

(loop for rxn in (get-reaction-list pwy) for enzymes = (enzymes-of-reaction rxn) when (> (length enzymes) 1) collect rxn))

36 SRI International Bioinformatics

Regulation

 Class Regulation with subclasses that describe different biochemical mechanisms of regulation

 Slots:

Regulator

Regulated-Entity

Mode

Mechanism

37 SRI International Bioinformatics

Regulation Class Taxonomy

38 SRI International Bioinformatics

39

Regulation of Enzyme Activity

 Class Regulation-of-Enzyme-Activity

 Each instance of the class describes one regulatory interaction

 Slots:

Regulator -- usually a small molecule

Regulated-Entity -- an Enzymatic-Reaction

Mechanism -- One of:

Competitive, Uncompetitive, Noncompetitive, Irreversible, Allosteric,

Unkmech, Other

Mode -- One of: + , -

Physiologically-Relevant?

SRI International Bioinformatics

40

Transcription-Units, Promoters,

Terminators, Binding-Sites

Transcription-Unit

One or more genes, transcribed as a unit

Any gene, promoter, etc. can belong to multiple transcription-units

One or zero promoters

Zero or more terminators, binding-sites

Transcription-Direction: +, -

Promoter

Absolute-plus-1-pos = transcription start site

Binds-Sigma-Factor

Terminator

Rho-Dependent or Rho-Independent

Left-End-Position, Right-End-Position

Binding-Site

DNA-Binding-Site or mRNA-Binding-Site

Left-End-Position, Right-End-Position

Involved-in-Regulation

SRI International Bioinformatics

Transcription Initiation

 Class Transcription-Factor-Binding

 Slots:

Regulator -- instance of Proteins or Complexes (a transcription-factor)

Regulated-Entity -- instance of Promoters or Transcription-

Units or Genes

Mode -- One of: + , -

Associated-Binding-Site

41 SRI International Bioinformatics

42

Attenuation

 Class Transcriptional-Attenuation

 Several subclasses depending on type of attenuation

 Slots common to all:

Regulator -- Depends on subtype of attenuation

Regulated-Entity -- instance of Terminators or Genes or

Transcription-Units

Mode -- One of: + , -

 Slots particular to one or more subclasses:

Associated-Binding-Site

Antiterminator-Start-Pos, Antiterminator-End-Pos

Pause-Start-Pos, Pause-End-Pos

SRI International Bioinformatics

43

Attenuation Subtypes

Protein-Mediated-Attenuation

RNA-Mediated-Attenuation

Small-Molecule-Mediated-Attenuation

Regulator = A protein/RNA/small molecule

Leader transcript binds protein/RNA/small molecule and determines formation of terminator or antiterminator

Ribosome-Mediated-Attenuation

Regulator = charged tRNA

Ribosome pauses, determining whether terminator or antiterminator forms

RNA-Polymerase-Modification

Regulator = instance of Proteins or Complexes

Regulatory protein binds to site in transcription unit and interacts with RNA polymerase to determine termination

Rho-Blocking-Antitermination

SRI International Bioinformatics

44

Regulation of Translation

 Class Regulation-of-Translation

 Several subclasses depending on type of attenuation

Compound-Mediated (i.e. riboswitch)

RNA-Mediated

Protein-Mediated

 Slots:

Regulator -- Depends on subtype of attenuation

Regulated-Entity -- Transcription-Unit or Gene

Mode -- One of: + , -

Mechanism – Translation-Blocking, mRNA-Degradation, and/or

Translation-Attenuation

Associated-Binding-Site

SRI International Bioinformatics

Transcription-Unit API functions

 (transcription-unit-genes tu)

 (transcription-unit-promoter tu)

 (transcription-unit-binding-sites tu)

 (transcription-unit-mrna-binding-sites tu)

 (transcription-unit-terminators)

 (transcription-unit-transcription-factors)

 (terminators-affecting-gene gene)

 (containing-tus frame)

 (binding-site-transcription-factors bs)

45 SRI International Bioinformatics

46

Regulation API Functions

 (genes-regulating-gene gene)

 (genes-regulated-by-gene gene)

Includes all transcriptional or translational regulation

 (direct-activators frame)

 (direct-inhibitors frame)

Frame will be an enzymatic-reaction, transcription-unit, promoter, etc. – this is a low-level function

 (transcription-unit-activators tu)

 (transcription-unit-inhibitors tu)

Includes transcriptional and translational regulators, and regulators of promoter as well as direct regulators of tu

SRI International Bioinformatics

Exercises

 Find all DNA-binding-sites that regulate a gene

 Find only those substrate-level regulators of an enzyme that are considered physiologically relevant

 Find all inhibitors of an enzyme, including transcriptional, translational and substrate-level inhibition.

47 SRI International Bioinformatics

48

Solution to Exercises

 Find all DNA-binding-sites that regulate a gene:

(defun gene-binding-sites (gene)

(remove-duplicates

(loop for tu in (containing-tus gene) append (transcription-unit-binding-sites tu))))

 Find only those substrate-level regulators of an enzyme that are considered physiologically relevant:

(defun relevant-regulators (enz)

(loop for enzrxn in (get-slot-values enz ‘catalyzes) append (loop for regframe in (get-slot-values enzrxn ‘regulated-by) when (get-slot-value regframe ‘physiologically-relevant?) collect (get-slot-value regframe ‘regulator)

)))

SRI International Bioinformatics

49

Solution to Exercises, cont.

 Find all inhibitors of an enzyme, including transcriptional, translational and substrate-level inhibition:

(defun enzyme-inhibitors (enz)

(let* ((genes (genes-of-enzyme enz))

(tus (remove-duplicates

(loop for g in genes append (containing-tus g))))

(terminators

(remove-duplicates

(loop for g in genes append (terminators-affecting-gene g))))

(enzrxns (get-slot-values enz 'catalyzes))

)

(append (loop for enzrxn in enzrxns append (direct-inhibitors enzrxn))

(loop for tu in tus append (transcription-unit-inhibitors tu))

(loop for term in terminators append (direct-inhibitors term))

)))

SRI International Bioinformatics

Download