The ERATO Systems Biology Workbench Michael Hucka, Hamid Bolouri, Andrew Finney, Herbert Sauro ERATO Kitano Systems Biology Project California Institute of Technology, Pasadena, CA, USA Principal Investigators: John Doyle, Hiroaki Kitano Collaborators: Adam Arkin (BioSpice), Dennis Bray (StochSim), Igor Goryanin (DBsolve), Les Loew (Virtual Cell), Pedro Mendes (Gepasi/Copasi), Masaru Tomita (E-Cell) Background • Modeling, simulation & analysis are critical – Huge volumes of data – Many disparate findings • Rapid rate of software tool development – Roles: data filtering, model creation, model simulation – Many groups are creating many tools • Different packages have different niche strengths reflecting expertise & preferences of the group • Strengths are often complementary to those of other packages 2 Complementary Strengths of Tools Multistate reactions/stochastic Reaction/Diffusion Optimization Bifurcation analysis Visualization of networks Handle large systems 3 Problems • No single package answers all needs of modelers • No single tool is likely to do so in the near future – Range of capabilities is large – New techniques ( new tools) evolving too rapidly • Researchers are likely to continue using multiple packages for the foreseeable future • Problems in using multiple tools: – Simulations & results often cannot be shared or re-used – Duplication of software development effort 4 Goal & Approach • Systems Biology Workbench goal: to provide software infrastructure that – Enables sharing of simulation/analysis software & models – Enables collaboration between software developers • Two-pronged approach: – Develop a common model exchange language • SBML: Systems Biology Markup Language – Develop an environment that enables tools to interact • SBW: Systems Biology Workbench 5 Systems Biology Markup Language • Problem: – Many software tools, few common exchange formats • Difficult to take advantage of multiple tools • Difficult to establish repositories of models • A Solution (In Principle): – Define a common exchange language • Use a simple, well-supported, textual substrate (XML) • Add components that reflect the natural conceptual constructs used by modelers in the domain 6 Structure of Model Definitions in SBML • Domain: biochemical network models • A model is described using a list of components: – Beginning of model definition » List of unit definitions (optional) » List of compartments » List of species » List of parameters (optional) » List of rules (optional) » List of reactions – End of model definition • Each component has a specific structure 7 Example X0 k1*X0 k2*S1 X1 k3*S1 X2 S1 8 Example (cont.) <?xml version=“1.0” encoding=“UTF-8”?> <sbml level=“1” version=“1”> <model name=“simple”> <listOfCompartments> <compartment name=“c1” /> </listOfCompartments> <listOfSpecies> <specie name=“X0” compartment=“c1” boundaryCondition=“true” initialAmount=“1”/> <specie name=“S1” compartment=“c1” boundaryCondition=“false” initialAmount=“0”/> <specie name=“X1” compartment=“c1” boundaryCondition=“true” initialAmount=“0”/> <specie name=“X2” compartment=“c1” boundaryCondition=“true” initialAmount=“0.23”/> </listOfSpecies> 9 Example (cont.) <?xml version=“1.0” encoding=“UTF-8”?> <sbml level=“1” version=“1”> <model name=“simple”> <listOfCompartments> <compartment name=“c1” /> </listOfCompartments> <listOfSpecies> <specie name=“X0” compartment=“c1” boundaryCondition=“true” initialAmount=“1”/> <specie name=“S1” compartment=“c1” boundaryCondition=“false” initialAmount=“0”/> <specie name=“X1” compartment=“c1” boundaryCondition=“true” initialAmount=“0”/> <specie name=“X2” compartment=“c1” boundaryCondition=“true” initialAmount=“0.23”/> </listOfSpecies> 10 Example (cont.) <?xml version=“1.0” encoding=“UTF-8”?> <sbml level=“1” version=“1”> <model name=“simple”> <listOfCompartments> <compartment name=“c1” /> </listOfCompartments> <listOfSpecies> <specie name=“X0” compartment=“c1” boundaryCondition=“true” initialAmount=“1”/> <specie name=“S1” compartment=“c1” boundaryCondition=“false” initialAmount=“0”/> <specie name=“X1” compartment=“c1” boundaryCondition=“true” initialAmount=“0”/> <specie name=“X2” compartment=“c1” boundaryCondition=“true” initialAmount=“0.23”/> </listOfSpecies> 11 Example (cont.) <listOfReactions> <reaction name=“reaction_1” reversible=“false”> <listOfReactants> <specieReference specie=“X0” stoichiometry=“1”/> </listOfReactants> <listOfProducts> <specieReference specie=“X0” stoichiometry=“1”/> </listOfProducs> <kineticLaw formula=“k1 * X0”> <listOfParameters> <parameter name=“k1” value=“0”/> </listOfParameters> </kineticLaw> </reaction> <reaction name=“reaction_2” reversible=“false”> <listOfReactants> <specieReference specie=“S1” stoichiometry=“1”/> </listOfReactants> . . . 12 Some Points About SBML • Users do not write in XML — software tools do! • SBML is being defined incrementally – SBML Level 1 covers non-spatial biochemical models • Intentionally kept simple for maximal compatibility – SBML Level 2 will extend Level 1 with more facilities Level 3? Level 2 Level 1 E.g.: • Composition • Geometry • Arrays … others • Defined in abstract form (UML) + textual descriptions – Used to define XML encoding + XML Schema 13 Systems Biology Workbench • Open-source, integrated software environment that enables sharing of computational resources – Allows software developers to build interprocess communications facilities into their applications • From the user’s perspective: – One SBW-enabled application can interact with another – Each application or module offers services to others • E.g.: ODE solution, time-based simulation, visualization, etc. 14 From the User’s Perspective 15 From the User’s Perspective 16 From the User’s Perspective 17 Behind the Scenes 18 From the Programmer’s Perspective • Desirable features – – – – Small application programming interface (API) Libraries that implement inter-program communications A registry of services for applications to query XML-based model representation (SBML) • Uses well-known, proven technologies – Communications via message-passing over plain sockets – Modular, distributed, broker-based architecture • API provides two styles: – “Low-level”: call/send operations + directory services – “High-level”: object-oriented interface layered on top 19 Driving Principles • Keep it language-neutral – We’ll provide C++, Java, Delphi libraries for Win, Linux • But can be implemented in any language • Keep it simple – Simple message-passing scheme • Avoid complexity & size of CORBA • Easy to make cross-platform compatible • Easy to make distributed – Simple low-level API, convenient higher-level API • Make sure contributors benefit – Open source development – Symmetric infrastructure: no application dominates 20 Modules Planned • Data filtering & preparation • Database support – E.g.: web searching, storage management • Model definition & manipulation – E.g.: scripting languages, visual editors • Equation solvers – E.g.: ODEs, DAEs, stochastic • Analysis & visualization tools – E.g.: bifurcation, 2-D/3-D/4-D plotting • Optimization & parameter searching 21 Summary & Availability • SBML – Level 1 specification is publicly available • http://www.cds.caltech.edu/erato – Support being added by other groups to their apps • SBW – Preliminary test implementation completed – Production version is now in development • Draft API definition & other info available – http://www.cds.caltech.edu/erato/sbw/docs – Expect first public release in November at ICSB 2001 (http://www.icsb2001.org) 22 23 Outline 1) Background and motivations I.e., why are we doing this? 2) Systems Biology Markup Language (SBML) Model description language & exchange format 3) Systems Biology Workbench Infrastructure for connecting software tools 24 Scope of SBML • Domain: biochemical network models X2 k7*S3*S12 X0 K1*X0 - S1*K2 S3/Ki Vmax*S1 Km+S1 S1 S2 k4*S2 - k5*S3 S3 k6*S3 X1 k3*S2 25