Existing Standards in Systems Biology Anatoly Sorokin Computation Systems Biology Group University of Edinburgh 8-Apr-15 Anatoly Sorokin Standard • 2000-2010 is decade of standards in biology – 31 MIBI standard – 56 OBO ontologies – About 80 exchange formats • Scope of interest • Language • Controlled vocabulary Standards and Languages • CML – description of chemical structure • MathML – representation of mathematical formulas • PSI – standard description of protein interaction data • AnatML – language to describe interaction at organ level • GeneOntology – standard and ontology to describe gene function and regulation Standards for Computational System Biology • BioPAX – language for database of biological networks exchange • SBML – language of biochemical model exchange • CellML – language to describe mathematical models • SBGN – visual language for biological model description MI standards • Reporting guidelines specify the minimum amount of meta data (information) and data required to meet a specific aim • Aim is to provide enough meta data and data to enable the unambiguous reproduction and interpretation of an experiment. • Normally informal human readable specifications that inform the development of formal data models (e.g. XML or UML), data exchange formats 8-Apr-15 Anatoly Sorokin Exchange format • Strict structure to exchange data of model • Mainly XML • Well defined meta-model, often supported by software API 8-Apr-15 Anatoly Sorokin Ontologies • “ontology deals with questions concerning what entities exist or can be said to exist, and how such entities can be grouped, related within a hierarchy, and subdivided according to similarities and differences” Wikipedia • Often used as controlled vocabulary and description support framework • GeneOntology 8-Apr-15 Anatoly Sorokin BioPAX • “Biological PAthway eXchange - A data exchange ontology and format for biological pathway integration, aggregation and inference” BioPAX Goals • BioPAX = Biological PAthway eXchange • Data exchange format for pathway data • Include support for these pathway types: – – – – – Metabolic pathways Signaling pathways Protein-protein, molecular interactions Gene regulatory pathways Genetic interactions • Accommodate representations used in existing databases such as BioCyc, BIND, WIT, aMAZE, KEGG, Reactome, etc. • PathwayCommons – collection of pathways in BioPAX – http://www.pathwaycommons.org BioPAX • • • • • BioPAX ontology and format in OWL (XML) Ontology built using GKB Editor and Protégé Semantic mapping still an issue Level 1 represents metabolic pathway data Level 2 adds support for molecular interactions, post-translational modifications, experimental description from PSI-MI model (Backwards compatible) • Level 3 adds support for generics, protein states, rearrange reaction representation BioPAX Ontology: Top Level Subclass (is a) Contains (has a) Entity • Interaction Pathway – A set of interactions – E.g. Glycolysis, MAPK, Apoptosis • Pathway Physical Entity Interaction – A set of entities and some relationship between them – E.g. Reaction, Molecular Association, Catalysis • Physical Entity – A building block of simple interactions – E.g. Small molecule, Protein, DNA, RNA BioPAX Ontology: Interactions Interaction Physical Interaction Control Conversion ComplexAssembly Catalysis Modulation BiochemicalReaction Transport TransportWithBiochemicalReaction BioPAX Ontology: Physical Entities PhysicalEntity Complex Protein RNA DNA Small Molecule BioPAX and other standards Database Exchange Formats BioPAX SBML, CellML Genetic Interactions PSI-MI 2 Interaction Networks Molecular Pro:Pro Simulation Model Exchange Formats Non-molecular TF:Gene Regulatory Pathways Low Detail Genetic Molecular Interactions Pro:Pro Biochemical Reactions All:All Metabolic Small Molecules Low Detail High Detail High Detail Low Detail Pathways High Detail Rate Formulas Simulation-related standards Model Result Simulation Minimal Requirements ? implements Exchange format SED-ML Makes sense of Ontology 8-Apr-15 implements Anatoly Sorokin SBRML Makes sense of SBML • “The Systems Biology Markup Language (SBML) is a computer-readable format for representing models of biochemical reaction networks. SBML is applicable to metabolic networks, cell-signaling pathways, regulatory networks, and many others. ” SBML – Reaction • container for rate law – Species • reactants, products, or modifiers of reaction – Compartment • container for species – Parameter, Rule, Event Characteristics of SBML • Many top-level types, little nesting – Units, Compartment, Species, Parameter, Reaction, Rule, Function, Event • Non-modular structure – Next SBML ‘Level’ (3) will introduce modularity • Emphasis on reactions • Some math implicit – Explicit rate equations; implicit integration – Implicit concentration conversion between compartments • Compartments are physical containers for species – Spatial dimensions (volume, surface) Structure of SBML Structure of SBML • Note field of SBase intended to store information for human to read • Annotation field of SBase provide a container for software-generated annotations that are not intended to be seen by humans • The id field is usually required for most structures and is used to identify a component within the model definition. • The name field is optional and provide a humanreadable label for the component. Model Result Simulation Minimal Requirements ? implements SED-ML Data model Makes sense of Ontology 8-Apr-15 implements Anatoly Sorokin SBRML Makes sense of MIRIAM • Model description require extra information – Biological • Description of elements of model – Mathematical • Definition of math concepts – Referential • Author name • Paper reference etc. • http://www.ebi.ac.uk/compneur-srv/miriam/ 8-Apr-15 Anatoly Sorokin Reference correspondence • The model must be encoded in a public, standardized, machinereadable format (SBML, CellML, GENESIS ...) • The model must comply with the standard in which it is encoded! • The model must be clearly related to a single reference description. If a model is composed from different parts, there should still be a description of the derived/combined model. • The encoded model structure must reflect the biological processes listed in the reference description. • The model must be instantiated in a simulation: All quantitative attributes have to be defined, including initial conditions. • When instantiated, the model must be able to reproduce all results given in the reference description within an epsilon (algorithms, round-up errors) 8-Apr-15 Anatoly Sorokin Attribution annotation • The model has to be named. • A citation of the reference description must be joined (completecitation, unique identifier, unambigous URL). The citation should permit to identify the authors of the model. • The name and contact of model creators must be joined. • The date and time of creation and last modification should be specified. An history is useful but not required. • The model should be linked to a precise statement about the terms of distribution. MIRIAM does not require “freedom of use” or “no cost”. 8-Apr-15 Anatoly Sorokin External resource annotation • The annotation must permit to unambiguously relate a piece of knowledge to a model constituent. • The referenced information should be described using a triplet {data-type, identifier, qualifier} – The data-type should be written as a Unique Resource Identifier (URI) – The identifier is analysed within the framework of the data-type. – Data-type and Identifier can be combined in a single URI http://www.myResource.org/#myIdentifier urn:lsid:myResource.org:myIdentifier – Qualifiers (optional) should refine the link between the model constitutent and the piece of knowledge: “has a”, “is version of”, “is homolog to” etc. 8-Apr-15 Anatoly Sorokin 8-Apr-15 Anatoly Sorokin Model Result Simulation Minimal Requirements ? implements SED-ML Data model Makes sense of Ontology 8-Apr-15 implements Anatoly Sorokin SBRML Makes sense of SBO • Part of OBO Foundry • Assign meanings to mathematical elements of SBML • Allows automatic validation of semantic consistency of math part of model • http://www.ebi.ac.uk/sbo 8-Apr-15 Anatoly Sorokin SBO • Types and roles of reaction participants, including terms like “substrate”, “catalyst” etc., but also “macromolecule”, or “channel”. • Parameter used in quantitative models. This vocabulary includes terms like “Michaelis constant” , “forward unimolecular rate constant”etc. A term may contain a precise mathematical expression stored as a MathML lambda function. The variables refer to other parameters. • Mathematical expressions. Examples of terms are “mass action kinetics”, “Henri-Michaelis-Menten equation” etc. A term may contain a precise mathematical expression stored as a MathML lambda function. The variables refer to the other vocabularies. • Modelling framework to precise how to interpret the rate-law. E.g. “continuous modelling”, “discrete modelling” etc. • Event type, such as “catalysis” or “addition of a chemical group”. 8-Apr-15 Anatoly Sorokin SBO 8-Apr-15 Anatoly Sorokin Model Result Simulation Minimal Requirements ? implements SED-ML Data model Makes sense of Ontology 8-Apr-15 implements Anatoly Sorokin SBRML Makes sense of MIASE • Minimum Information About a Simulation Experiment – What base model to use & which modifications to apply – What simulation task to run on those models (algorithms, see KiSAO; simulation parameters) – How to post-process the numerical results and to present them • http://www.ebi.ac.uk/compneur-srv/miase/ • Subset of MISE bould be encoded in SED-ML 8-Apr-15 Anatoly Sorokin Description of models 8-Apr-15 Anatoly Sorokin Description of models 8-Apr-15 Anatoly Sorokin Simulations 8-Apr-15 Anatoly Sorokin Simulation task 8-Apr-15 Anatoly Sorokin Data generation 8-Apr-15 Anatoly Sorokin Data generation 8-Apr-15 Anatoly Sorokin Production of results 8-Apr-15 Anatoly Sorokin Model Result Simulation Minimal Requirements ? implements SED-ML Data model Makes sense of Ontology 8-Apr-15 implements Anatoly Sorokin SBRML Makes sense of KiSAO • Kinetic Simulation Algorithm Ontology – Classification of simulation algorithms & methods – Definition, literature references – Relations between different simulation algorithms & methods • http://www.ebi.ac.uk/compneursrv/kisao/index.html 8-Apr-15 Anatoly Sorokin KiSAO http://bioportal.bioontology.org/visualize/40844 8-Apr-15 Anatoly Sorokin Model Result Simulation Minimal Requirements ? implements SED-ML Data model Makes sense of Ontology 8-Apr-15 implements Anatoly Sorokin SBRML Makes sense of SBRML • Systems Biology Results Markup Language • A new markup language for specifying the results from operations on SBML models • http://www.comp-sys-bio.org/tikiindex.php?page=SBRML 8-Apr-15 Anatoly Sorokin SBRML 8-Apr-15 Anatoly Sorokin SBRML 8-Apr-15 Anatoly Sorokin 8-Apr-15 Anatoly Sorokin 8-Apr-15 Anatoly Sorokin 8-Apr-15 Anatoly Sorokin Dimension example 8-Apr-15 Anatoly Sorokin 8-Apr-15 Anatoly Sorokin Dimension example 8-Apr-15 Anatoly Sorokin Model Result Simulation Minimal Requirements ? implements SED-ML Data model Makes sense of Ontology 8-Apr-15 implements Anatoly Sorokin SBRML Makes sense of TEDDY • The TErminology for the Description of DYnamics (TEDDY) project aims to provide an ontology for dynamical behaviours, observable dynamical phenomena, and control elements of biomodels and biological systems in Systems Biology and Synthetic Biology. • http://www.ebi.ac.uk/compneur-srv/teddy/ 8-Apr-15 Anatoly Sorokin TEDDY top-level structure • Temporal Behaviour (concrete behaviours of a model, more or less the same as trajectories): – Oscillation, Steady State, Fixed Point, Cycle, ... • Behaviour Characteristic (properties to characterise concrete behaviours): – Period, Amplitude, ... • Behaviour Diversification (system properties describing the ability of systems to exhibit different behaviours): – Bifurcation, Bi-Stability • Functional Motif (structural features of a system necessary for specific function): – Negative Feedback, FFL, ... 8-Apr-15 Anatoly Sorokin TEDDY 8-Apr-15 Anatoly Sorokin Questions 8-Apr-15 Anatoly Sorokin