Lecture 15 - Existing Standards in Systems Biology

advertisement
Existing Standards in
Systems Biology
Anatoly Sorokin
Computation Systems Biology Group
University of Edinburgh
8-Apr-15
Anatoly Sorokin
Standard
• 2000-2010 is decade of standards in
biology
– 31 MIBI standard
– 56 OBO ontologies
– About 80 exchange formats
• Scope of interest
• Language
• Controlled vocabulary
Standards and Languages
• CML – description of chemical structure
• MathML – representation of mathematical
formulas
• PSI – standard description of protein
interaction data
• AnatML – language to describe interaction
at organ level
• GeneOntology – standard and ontology to
describe gene function and regulation
Standards for Computational System
Biology
• BioPAX – language for database of
biological networks exchange
• SBML – language of biochemical model
exchange
• CellML – language to describe
mathematical models
• SBGN – visual language for biological
model description
MI standards
• Reporting guidelines specify the minimum
amount of meta data (information) and data
required to meet a specific aim
• Aim is to provide enough meta data and data to
enable the unambiguous reproduction and
interpretation of an experiment.
• Normally informal human readable specifications
that inform the development of formal data
models (e.g. XML or UML), data exchange
formats
8-Apr-15
Anatoly Sorokin
Exchange format
• Strict structure to exchange data of model
• Mainly XML
• Well defined meta-model, often supported
by software API
8-Apr-15
Anatoly Sorokin
Ontologies
• “ontology deals with questions concerning
what entities exist or can be said to exist,
and how such entities can be grouped,
related within a hierarchy, and subdivided
according to similarities and differences”
Wikipedia
• Often used as controlled vocabulary and
description support framework
• GeneOntology
8-Apr-15
Anatoly Sorokin
BioPAX
• “Biological PAthway eXchange - A
data exchange ontology and format
for biological pathway integration,
aggregation and inference”
BioPAX Goals
• BioPAX = Biological PAthway eXchange
• Data exchange format for pathway data
• Include support for these pathway types:
–
–
–
–
–
Metabolic pathways
Signaling pathways
Protein-protein, molecular interactions
Gene regulatory pathways
Genetic interactions
• Accommodate representations used in existing
databases such as BioCyc, BIND, WIT, aMAZE, KEGG,
Reactome, etc.
• PathwayCommons – collection of pathways in BioPAX
– http://www.pathwaycommons.org
BioPAX
•
•
•
•
•
BioPAX ontology and format in OWL (XML)
Ontology built using GKB Editor and Protégé
Semantic mapping still an issue
Level 1 represents metabolic pathway data
Level 2 adds support for molecular interactions,
post-translational modifications, experimental
description from PSI-MI model (Backwards
compatible)
• Level 3 adds support for generics, protein states,
rearrange reaction representation
BioPAX Ontology: Top Level
Subclass (is a)
Contains (has a)
Entity
•
Interaction
Pathway
– A set of interactions
– E.g. Glycolysis, MAPK, Apoptosis
•
Pathway
Physical Entity
Interaction
– A set of entities and some relationship between them
– E.g. Reaction, Molecular Association, Catalysis
•
Physical Entity
– A building block of simple interactions
– E.g. Small molecule, Protein, DNA, RNA
BioPAX Ontology: Interactions
Interaction
Physical Interaction
Control
Conversion
ComplexAssembly
Catalysis
Modulation
BiochemicalReaction
Transport
TransportWithBiochemicalReaction
BioPAX Ontology: Physical Entities
PhysicalEntity
Complex
Protein
RNA
DNA
Small Molecule
BioPAX and other standards
Database Exchange
Formats
BioPAX
SBML,
CellML
Genetic
Interactions
PSI-MI 2
Interaction Networks
Molecular
Pro:Pro
Simulation Model
Exchange Formats
Non-molecular
TF:Gene
Regulatory Pathways
Low Detail
Genetic
Molecular Interactions
Pro:Pro
Biochemical
Reactions
All:All
Metabolic
Small Molecules
Low Detail
High Detail
High Detail
Low Detail
Pathways
High Detail
Rate
Formulas
Simulation-related standards
Model
Result
Simulation
Minimal
Requirements
?
implements
Exchange
format
SED-ML
Makes
sense of
Ontology
8-Apr-15
implements
Anatoly Sorokin
SBRML
Makes
sense of
SBML
• “The Systems Biology Markup Language
(SBML) is a computer-readable format for
representing models of biochemical
reaction networks. SBML is applicable to
metabolic networks, cell-signaling
pathways, regulatory networks, and many
others. ”
SBML
– Reaction
• container for rate law
– Species
• reactants, products, or modifiers of reaction
– Compartment
• container for species
– Parameter, Rule, Event
Characteristics of SBML
• Many top-level types, little nesting
– Units, Compartment, Species, Parameter, Reaction, Rule, Function,
Event
• Non-modular structure
– Next SBML ‘Level’ (3) will introduce modularity
• Emphasis on reactions
• Some math implicit
– Explicit rate equations; implicit integration
– Implicit concentration conversion between compartments
• Compartments are physical containers for species
– Spatial dimensions (volume, surface)
Structure of SBML
Structure of SBML
• Note field of SBase intended to store information for
human to read
• Annotation field of SBase provide a container for
software-generated annotations that are not intended to
be seen by humans
• The id field is usually required for most structures and is
used to identify a component within the model definition.
• The name field is optional and provide a humanreadable label for the component.
Model
Result
Simulation
Minimal
Requirements
?
implements
SED-ML
Data model
Makes
sense of
Ontology
8-Apr-15
implements
Anatoly Sorokin
SBRML
Makes
sense of
MIRIAM
• Model description require extra information
– Biological
• Description of elements of model
– Mathematical
• Definition of math concepts
– Referential
• Author name
• Paper reference etc.
• http://www.ebi.ac.uk/compneur-srv/miriam/
8-Apr-15
Anatoly Sorokin
Reference correspondence
• The model must be encoded in a public, standardized, machinereadable format (SBML, CellML, GENESIS ...)
• The model must comply with the standard in which it is encoded!
• The model must be clearly related to a single reference description.
If a model is composed from different parts, there should still be a
description of the derived/combined model.
• The encoded model structure must reflect the biological processes
listed in the reference description.
• The model must be instantiated in a simulation: All quantitative
attributes have to be defined, including initial conditions.
• When instantiated, the model must be able to reproduce all results
given in the reference description within an epsilon (algorithms,
round-up errors)
8-Apr-15
Anatoly Sorokin
Attribution annotation
• The model has to be named.
• A citation of the reference description must be joined
(completecitation, unique identifier, unambigous URL).
The citation should permit to identify the authors of the
model.
• The name and contact of model creators must be joined.
• The date and time of creation and last modification
should be specified. An history is useful but not required.
• The model should be linked to a precise statement about
the terms of distribution. MIRIAM does not require
“freedom of use” or “no cost”.
8-Apr-15
Anatoly Sorokin
External resource annotation
• The annotation must permit to unambiguously relate a
piece of knowledge to a model constituent.
• The referenced information should be described using a
triplet {data-type, identifier, qualifier}
– The data-type should be written as a Unique Resource Identifier
(URI)
– The identifier is analysed within the framework of the data-type.
– Data-type and Identifier can be combined in a single URI
http://www.myResource.org/#myIdentifier
urn:lsid:myResource.org:myIdentifier
– Qualifiers (optional) should refine the link between the model
constitutent and the piece of knowledge: “has a”, “is version of”,
“is homolog to” etc.
8-Apr-15
Anatoly Sorokin
8-Apr-15
Anatoly Sorokin
Model
Result
Simulation
Minimal
Requirements
?
implements
SED-ML
Data model
Makes
sense of
Ontology
8-Apr-15
implements
Anatoly Sorokin
SBRML
Makes
sense of
SBO
• Part of OBO Foundry
• Assign meanings to
mathematical elements of
SBML
• Allows automatic
validation of semantic
consistency of math part
of model
• http://www.ebi.ac.uk/sbo
8-Apr-15
Anatoly Sorokin
SBO
• Types and roles of reaction participants, including terms like
“substrate”, “catalyst” etc., but also “macromolecule”, or “channel”.
• Parameter used in quantitative models. This vocabulary includes
terms like “Michaelis constant” , “forward unimolecular rate
constant”etc. A term may contain a precise mathematical expression
stored as a MathML lambda function. The variables refer to other
parameters.
• Mathematical expressions. Examples of terms are “mass action
kinetics”, “Henri-Michaelis-Menten equation” etc. A term may
contain a precise mathematical expression stored as a MathML
lambda function. The variables refer to the other vocabularies.
• Modelling framework to precise how to interpret the rate-law. E.g.
“continuous modelling”, “discrete modelling” etc.
• Event type, such as “catalysis” or “addition of a chemical group”.
8-Apr-15
Anatoly Sorokin
SBO
8-Apr-15
Anatoly Sorokin
Model
Result
Simulation
Minimal
Requirements
?
implements
SED-ML
Data model
Makes
sense of
Ontology
8-Apr-15
implements
Anatoly Sorokin
SBRML
Makes
sense of
MIASE
• Minimum Information About a Simulation
Experiment
– What base model to use & which modifications to
apply
– What simulation task to run on those models
(algorithms, see KiSAO; simulation parameters)
– How to post-process the numerical results and to
present them
• http://www.ebi.ac.uk/compneur-srv/miase/
• Subset of MISE bould be encoded in
SED-ML
8-Apr-15
Anatoly Sorokin
Description of models
8-Apr-15
Anatoly Sorokin
Description of models
8-Apr-15
Anatoly Sorokin
Simulations
8-Apr-15
Anatoly Sorokin
Simulation task
8-Apr-15
Anatoly Sorokin
Data generation
8-Apr-15
Anatoly Sorokin
Data generation
8-Apr-15
Anatoly Sorokin
Production of results
8-Apr-15
Anatoly Sorokin
Model
Result
Simulation
Minimal
Requirements
?
implements
SED-ML
Data model
Makes
sense of
Ontology
8-Apr-15
implements
Anatoly Sorokin
SBRML
Makes
sense of
KiSAO
• Kinetic Simulation Algorithm Ontology
– Classification of simulation algorithms &
methods
– Definition, literature references
– Relations between different simulation
algorithms & methods
• http://www.ebi.ac.uk/compneursrv/kisao/index.html
8-Apr-15
Anatoly Sorokin
KiSAO
http://bioportal.bioontology.org/visualize/40844
8-Apr-15
Anatoly Sorokin
Model
Result
Simulation
Minimal
Requirements
?
implements
SED-ML
Data model
Makes
sense of
Ontology
8-Apr-15
implements
Anatoly Sorokin
SBRML
Makes
sense of
SBRML
• Systems Biology Results Markup
Language
• A new markup language for specifying the
results from operations on SBML models
• http://www.comp-sys-bio.org/tikiindex.php?page=SBRML
8-Apr-15
Anatoly Sorokin
SBRML
8-Apr-15
Anatoly Sorokin
SBRML
8-Apr-15
Anatoly Sorokin
8-Apr-15
Anatoly Sorokin
8-Apr-15
Anatoly Sorokin
8-Apr-15
Anatoly Sorokin
Dimension example
8-Apr-15
Anatoly Sorokin
8-Apr-15
Anatoly Sorokin
Dimension example
8-Apr-15
Anatoly Sorokin
Model
Result
Simulation
Minimal
Requirements
?
implements
SED-ML
Data model
Makes
sense of
Ontology
8-Apr-15
implements
Anatoly Sorokin
SBRML
Makes
sense of
TEDDY
• The TErminology for the Description of
DYnamics (TEDDY) project aims to
provide an ontology for dynamical
behaviours, observable dynamical
phenomena, and control elements of biomodels and biological systems in Systems
Biology and Synthetic Biology.
• http://www.ebi.ac.uk/compneur-srv/teddy/
8-Apr-15
Anatoly Sorokin
TEDDY top-level structure
• Temporal Behaviour (concrete behaviours of a model,
more or less the same as trajectories):
– Oscillation, Steady State, Fixed Point, Cycle, ...
• Behaviour Characteristic (properties to characterise
concrete behaviours):
– Period, Amplitude, ...
• Behaviour Diversification (system properties describing
the ability of systems to exhibit different behaviours):
– Bifurcation, Bi-Stability
• Functional Motif (structural features of a system
necessary for specific function):
– Negative Feedback, FFL, ...
8-Apr-15
Anatoly Sorokin
TEDDY
8-Apr-15
Anatoly Sorokin
Questions
8-Apr-15
Anatoly Sorokin
Download