Developing Standards: Case Studies www.sys-bio.org www.sbml.org www.sbolstandards.org blog.analogmachine.org Herbert M Sauro Dept. of Bioengineering University of Washington, Seattle, WA hsauro@u.washinton.edu 1 Importance of Standards Imagine a world where: Each company made its own incompatible nut, bold and screw? Every town had its own way to measure time. Every internet provider used different protocols for the ‘TCP/IP’ stack, email, web etc. and so on Standards are vital for the normal functioning of society 2 At least two ways to start a standard: 1. Top-down: institutionalized stick and carrot 2. Grass Roots 3 Two Examples SBML: Systems Biology Markup Language SBOL: Synthetic Biology Open Language 4 Simulation of Computational Models Simulation 5 Why? Study Perturbations Apoptosis Change the activity of a Protein, e.g. P53 by adding an inhibitor http://www.sapphirebioscience.com What effect does this have on Cell death and/or proliferation? There may be multiple paths or multiple effects 6 How it started: SCAMP and Gepasi: 80/90s X SCAMP 7 Exchange of Computational Models In 1999/2000 a project was started at Caltech with initial funding from Japan to devise an interchange language: SBML: Systems Biology Markup Language 8 SBML SBML: Systems biology Markup Language Used to represent homogenous multi-compartmental Biochemical Systems 9 SBML in a Nutshell “Systems Biology Markup Language” • A machine-readable format for representing computational models in systems biology • Domain: systems of biochemical reactions • Specified using XML • Components in SBML reflect the natural conceptual constructs of the domain • Now over 200 tools use SBML 10 SBML in a Nutshell “Systems Biology Markup Language” • Simple Compartments (well stirred reactor) • Internal/External Species • Reaction Schemes • Global Parameters • Arbitrary Rate Laws • DAEs (ODE + Algebraic functions, Constraints) • Physical Units/Model Notes • Annotation – extension capability • Events 11 SBML – Systems Biology Markup Language 12 Model Exchange Standards: SBML, CellML SBML is primarily a way to describe the biology of cellular networks from which the mathematical models can be automatically derived. CellML is a math based description from which the underling biological can be inferred. 13 There many modeling software tools that use SBML www.sbml.org 14 SBML Ecosystem SBML Unambiguous Model Exchange Diagrams Databases Journals Semantic Annotations SEDML: Simulation Experiment Description Language SBGN : Systems Biology Graphical Notation Simulator Comparison and Compliance 15 Model repositories Nicolas Le Novere BioModels.net As of Sep 2011: 366 curated models 398 uncurated models. http://www.ebi.ac.uk/biomodels/ 16 MIRIAM: Minimum Information Requested in the Annotation of biochemical Models MIRIAM is not a file format but a minimum specification on how a model should be made available to the community: Reference correspondence – encoding a model in a recognized public standardized machine-readable format. Attribution annotation - A model has to provide the citation of the reference description, lists its creators, and be attached to some terms of distribution. External resource annotation - each component of a model must be annotated to allow its unambiguous identification. 17 Semantic Annotations 1. SBO: Systems Biology Ontology (Quantitative terms) 2. MIASE: The Minimum Information About a Simulation Experiment 3. TEDDY: The Terminology for the Description of Dynamics 4. KiSAO: Simulation Algorithm Ontology 5. Missing: An audit trail of a modeling process. 18 SBO: Systems Biology Ontology 1. [Term] id: SBO:0000002 name: quantitative parameter def: "A number representing a quantity that defines certain characteristics of systems or functions. A parameter may be part of a calculation, but its value is not determined by the form of the equation itself, and may be arbitrarily assigned." [] relationship: part of SBO:0000000 ! Systems Biology Ontology 2. [Term] id: SBO:0000012 name: mass action kinetics def: "The Law of Mass Action, first expressed by Waage and Guldberg in 1864 (Waage, P., Guldberg, C. M. Forhandlinger: Videnskabs-Selskabet i Christiana 1864, 35) states that…..." [] is a: SBO:0000001 ! rate law. Terms can be queried programmatically via a web service 19 Systems Biology Ontology in SBML <reaction sboTerm="SBO:0000062"> continuous framework <listOfReactants> <speciesReference species="S" sboTerm="SBO:0000015" /> </listOfReactants> <listOfProducts> <speciesReference species="P" sboTerm="SBO:0000011" /> </listOfProducts> <listOfModifiers> <speciesReference species="E" sboTerm="SBO:0000014" /> </listOfModifiers> <kineticLaw sboTerm="SBO:0000031"> <listOfParameters> substrate product enzyme Briggs-Haldane equation <parameter id="Km" sboTerm="SBO:0000027" /> <parameter id="kp" sboTerm="SBO:0000025" /> </listOfParameters> <math xmlns="http://www.w3.org/1998/Math/MathML"> <apply> <divide/> <apply> <times /> <ci>E</ci> <ci>kp</ci> <ci>S</ci> </apply> <apply> <plus /> <ci>Km</ci> <ci>S</ci> </apply> </apply> </math> </kineticLaw> </reaction> Michaelis constant catalytic rate constant 20 European Bioinformatics Institute Application: Simulator Compliance SBML Compliance # Simulation Results returned for 150 models VCell SBToolBox2 SBML ode Solver roadRunner Oscill8 MathSBML Jsim Jarnac COPASI BioUML 0.00 50.00 100.00 150.00 21 The Results % Agreement of Simulation Results 80 70 Number of Models 60 50 40 30 20 10 0 0% to 20% 20% to 40% 40% to 60% 60% to 80% 80% to 99% 100% 22 Other Proposed Standards Standardizing the diagrammatic notation http://www.sbgn.org/Main_Page 23 What we all learned 24 Fact: Developing a standard has both technical as well sociological challenges. The sociological challenges may be greater, :( 25 Rule #1: There must be a problem (i.e an actual need) that a particular community wants to solve. • Clear scope • Covers what is needed • Doesn’t force you to deal with things that are not needed 26 Rule #2: Building a community from day one is of the utmost importance. • • • • Build Trust Build Consensus Build Enthusiasm Build Ownership 27 Rule #3: For a standard to succeed, the central players must provide tools and documentation to help the community use the standard. • Easy to implement • Low ‘buy in’ cost 28 Rule #4: The process is long and drawn out, far beyond the normal patience of review panels and funding agencies. 29 Summary Initial cost for the SBML development: Initial version was funded by JST (roughly 250K direct per year for three years). Could probably get by with 150K direct. This funds a core team which is involved in: 1. Documentation 2. Organizing two workshops per year 3. Developing the initial source libraries 4. Develop a governance model 5. Follow discussions on mailing lists/workshops to address the needs of the community 6. Maintain civility during discussions ! 30 Centralized development of supporting software libraries: 1) Prevented the standard from diverging 2) As extensions or modifications were agreed to by the community it was relatively easy for platform developers to incorporate the changes into their software. 3) Software developed in C/C++ to make the library cross-language (Java came later). 31 Current work of my group: Model Reproducibility Biology Data Simulation Tool SBML SEDML Data SEDML: What you did with the model 32 Synthetic Biology 33 Synthetic biology “The design and construction of new biological entities such as enzymes, genetic circuits, cells, and organs or the redesign of existing biological systems.” Drew Endy (Stanford) 34 The Immediate Need Take any current publication on a synthetic circuit and try to reproduce it, let me know how you get on. 35 GFP (RFU) The long term vision: Design, Build, Test time Testing/ Analysis Build Specification Design 36 Synthetic Biology Open Language (SBOL) – SBOL Semantic Fabricate Synthetic Biologist A SBOL visual DNA Components B0015 Engineer DNA Component Synthetic Biologist B 1-80 81-88 89-129 Sequence Annotation B0010 BioBrick Scar B0012 Terminator BioBrick Scar Terminator New device semantic describe and send 37 Some History The synthetic biology standardization effort was started with a grant from Microsoft in 2008 (100K). The first meeting was held in Seattle. The first draft proposal was called PoBoL but has since been renamed to SBOL – Systems Biology Open Language Since then we have (somehow) managed to organize two meetings a year since 2008, next one in Jan 2012 in Seattle. 38 Overall Aim of the Standardization Effort To support the synthetic biology workflow: 1. 2. 3. 4. 5. 6. Laboratory parts management Simulation/Analysis Design Codon optimization Assembly Repositories - preferably distributed 39 Overall Aim of the Standardization Effort Specifically: • To allow researches to electronically exchange designs with round-tripping. • To send designs to bio-fabrication centers for assembly. • To allow storage of designs in repositories and for publication purposes. 40 Synthetic Biology Synthetic Biology is Engineering, i.e it is not biology* Design Build Test * Beware of sending synthetic biology grant proposals to a biology panel 41 Synthetic Biology Synthetic Biology is Engineering, i.e it is not biology* Verification Design Build Test Debugging * Beware of sending synthetic biology grant proposals to a biology panel 42 Synthetic Biology Synthetic Biology is Engineering, i.e it is not biology* Verification Design Build Test Debugging * Beware of sending synthetic biology grant proposals to a biology panel 43 A Real Network (E. coli) Host Context Design/Construction 1.2 1.2 1 Relative Fluorescence 1 p3 0.8 Increased Repression 0.6 Simulation 0.4 Experimental Data 0.6 0.4 Increased Repression 0.2 0.2 0 0.001 0.8 0.01 0.1 1 10 p1 Entus et al, Systems and Synthetic Biology, 2007. 0 0.001 0.01 0.1 1 10 100 1000 IPTG (mM) 44 http://www.agricorner.com/e-coli-outbreak-german-farm-in-uelzen-likely-source/ Synthetic Networks Concentration Detector Generic Design: If we control the level of feed-forward Inhibition we can tune the circuit: 1.2 1 p3 0.8 0.6 0.4 0.2 0 0.001 0.01 0.1 p1 1 10 45 Synthetic Networks Concentration Detector Generic Design: Input: IPTG Output: GFP 1.2 Relative Fluorescence 1 0.8 0.6 0.4 0.2 0 0.001 0.01 0.1 1 IPTG (mM) 10 100 1000 46 CAD Software- Engineering Cycle Simulation Design 1.2 1 0.6 0.4 Fabrication 0.2 0 0.001 0.01 0.1 1 10 p1 Testing 1.2 1 0.8 0.6 0.4 0.2 0 0.001 0.01 0.1 1 10 100 1000 IPTG (mM) 1 0.8 Fluorescence Relative Fluorescence p3 0.8 0.6 0.4 0.2 0 0.001 0.01 0.1 1 10 100 1000 IPTG (mM) 47 Computational tools and information resources support each step TinkerCell CAD iBioSim Laboratory Information GDice Clotho Analysis Specification BIOFAB Build ApE Sequence Editor Design GenoCAD Public Data 48 Registry of Standard Biological Parts (BioBricks) http://parts.mit.edu Provides free access to an open commons of basic biological functions that can be used to program synthetic biological systems Anybody may contribute, draw upon, or improve the parts maintained within the Registry. Endy D, 2005. Nature 438: 449-45349 SBOL is extensible, allows us to form community subgroups type Sample cell strain UW002 dna MG1655 type pUW4510 Cell Experimental Measurements Computational Models subClassOf Plasmid DNA Physical and Host Context B0015 annotatio n 1-80 Visualization featur e B0010 type Terminator Assembly Methods subClassO f annotatio n 81-88 featur e BioBrick Scar type BioBrick Scar subClassO f Sequence Feature annotatio n 89-129 type featur e B0012 type Sequence Annotation SS002 Terminator subClassO f Core SBOL 50 TinkerCell: Project to explore the potential of computer aided design in synthetic biology First prototype called Athena developed by Bergmann and Chandran 51 Layered Architecture: Based on C++/Qt Octave, 52 Each component in the TinkerCell diagram is associated with one or more tables 53 A TinkerCell model can be composed of sub-models 54 A TinkerCell model can be composed of sub-models ? ? ? ? ? ? 55 Availability www.tinkercell.com (Windows, Mac and Linux, released under BSD) Contact author for details (dchandran1@gmail.com) 56 Challenges in building SBOL • Gaining consensus in a growing community – Identifying and engaging stakeholders • Fast pace of in the field – Terminology evolution • “BioBricks” “Parts” “DNA components” – Stability of use cases • “Standard” and “Research needs” seem contradictory – Software for synthetic biology is new • Scarcity of data sources – Quality “knowledge” about elements – Heterogeneity of existing annotations • Funding 57 Who is the we? University of Washington Deepak Chandran John Gennari Michal Galdzicki Herbert Sauro University of California, Berkeley J. Christopher Anderson Boston University Douglas Densmore Virginia Bioinformatics Institute Laura Adam Matthew Lux Mandy Wilson Jean Peccoud University of Toronto Raik Gruenberg http://www.sbolstandard.org/ BIOFAB Cesar Rodriguez Akshay Maheshwari (now UCSD) Drew Endy (Stanford) Joint BioEnergy Institute Timothy Ham University of Utah Barry Moore Nicholas Roehner Chris J. Myers iBioSim Imperial College of London Guy-Bart Stan Newcastle University (UK) Aniel Recent Commercial Interest BBN, DNA 2.0, Agilent Life Technologies, AutoDesk 58 Acknowledgements: The People and the Support Hamid Bolouri Andrew Finney Mike Hucka Herbert Sauro Frank Bergmann Deepak Chandran Vijay Chickarmane Michal Galdzicki Lucian Smith Funding in chronological order(2000 -> 2011): …… 59 Textbook Enzyme Kinetics for Systems Biology • • • • • Available as e-book or paperback on www.analogmachine.org & 318 pages, 94 illustrations and 75 exercises E-book - $9.95 Paperback - $39.95 Author: H M Sauro 60