XML for Model Specification: Introduction and Workshop XML for Model Specification: An Introduction and Workshop An Introduction to XML in the Neurosciences Sharon Crook, Arizona State University An Introduction to NeuroML Fred Howell, University of Edinburgh NeuroML for Model Specification in ChannelDB and GENESIS Dave Beeman, University of Colorado, Boulder MorphML: An XML Application for Neuronal Morphology Data Sharon Crook, Arizona State University Building 3-D Network Models with neuroConstruct Padraig Gleeson, University College London Discussion: Current Issues and Future Development Introduction to XML in the Neurosciences What is an eXtensible Markup Language (XML) application? Portable format for computer documents. Data are surrounded by text descriptions called tags. Due to the self-describing representation, programs can parse the data easily. Tags are ordinary text and should be clear, concise, and make sense to humans. Language elements that provide the structure make up an XML schema. Each language element may also be equivalent to an object class. <!-Segment: mainDend2, ID: 2--> <segment> <id>2</id> <proximal>4</proximal> <distal>5</distal> <parent>0</parent> </segment> Additional Potential Benefits of XML Schema: Validate documents/data. Generate instructions for creating database tables for data element storage and access. Easily generate data structures and code for reading and writing valid XML documents. Facilitates communication and collaboration!!! Potential Liabilities of XML: May not be clear, concise, easy to read. Most of the advantages of XML can be accomplished with good discourse, good design, and good documentation. Extremely verbose so performance will suffer. Private data accessed more easily. Additional Benefits of XML: Neuroinformatics infrastructure (NeuroML Schema and NeuroML Development Kit, MorphML, BrainML, CellML, SBML) Commercial/free development software available Schema development, validation, documentation and more Altova XMLSpy http://www.altova.com XML schema to Java object classes and more Java Architecture for XML Binding (JAXB) http://java.sun.com/xml/jaxb Relevant XML Applications: BrainML (http://www.brainml.org) Laboratory of Neuroinformatics, Weill Medical College of Cornell University Examples: time series data, spike trains, experimental protocols, recording sites, bibliographic citations, taxonomy, vital statistics of subject, training statistics, some attributes of neurons for inheritance SBML (http://www.sbml.org) Systems Biology Markup Language for modeling biochemical reaction networks Examples: metabolic networks, cell-signaling pathways, regulatory networks CellML (http://www.cellml.org) Bioengineering Institute, University of Auckland Examples: models of cellular and subcellular processes such as calcium dynamics, metabolic pathways, signal transduction MathML (http://www.w3.org/Math) W3C World Wide Web Consortium Examples: mathematical notation with structure and content; serve, receive, and process mathematics on the web An introduction to NeuroML Fred Howell Adaptive and neural computation Informatics University of Edinburgh fwh@inf.ed.ac.uk Overview What are the problems? Object models, data binding and NeuroML The next steps? Scratch pad ● Ideas for new slides. Collins et al, J Biol Chem 280:7 2005 Orig version by Ding Fan Models we’d like to build… Aim Move model specifications from programs to a declarative XML format. Why XML? Language independent way to store complex structured information. Huge industry momentum. Not a programming language – so encourages declarative specifications. Possible to transform from one format to another – whereas programs have to be recoded by hand Why not XML? Cumbersome to edit by hand Large text files, need to be compressed Harder to parse than ad hoc text formats Not suitable for binary data Scripts Parameter search Simulation engine Results + visualisations Custom extensions of simulator Declarative model spec (in XML) Simulation engine Results + visualisations How would one publish a model? Put XML model spec on your website, + links to code to run it. Plus links back to any experimental data used to derive parameters / validate results. See Robert Gentleman's campaign for “reproducible research” (and also ModelDB) Why is this hard? Lots of levels of scale and detail of models (from protein interactions to large scale networks of neurons) Different simulators have different and changing capabilities – which creates a moving target for attempts to build any standards Union or intersection? Should a model exchange format restrict itself to a standard subset of possible models, or cope with any possible model? What is “NeuroML”? A way to map from object trees to XML ... with a java development kit ... and some suggestions for sample schemas channel, cell and network levels Emphasis on making it easy to define any object model and serialise it ... create generic tools which work with any object model ... and encourage developers to agree on common object models where it makes sense Other XML Languages SBML : a standard for intracellular pathway models CellML MathML Practicalities “I'm writing a simulator and I'd like to get the models into NeuroML – what do I do?” (1) Separate out the declarative aspects of the model spec (2) Serialise the model into XML, using the NeuroML development kit (in Java) or your own code (3) If any other developers are creating similar models, see if you can agree on a common set of classes to describe the models by hand The NeuroML development kit A “data binding” kit Start with class definitions Utilities to read / write model definition as readable XML Data binding A technique for serialising data in objects as XML Your program can read in an XML document using: Object o = XMLIn(“file.xml”); And write one using: Object o = new MyComplexStructure(); XMLOut(o,”file.xml”); The XML tags can correspond to fields in the class <neuroml class=”MyComplexStructure”> <position>1000</position> <sequence>ACGGTTCAG</sequence> <pubmedID>4321652</pubmedID> </neuroml> class MyComplexStructure { int position; String sequence; int pubmedID; } Example network definition: package neuroml.model.network; import neuroml.core.*; public class Network extends Element { /** A network has a set of elements - can be populations or individual cells */ public Set elements = new Set("ElementRef"); /** A network also defines a set of projections between elements */ public Set projections = new Set("Projection"); } Example class definition: public class Grid3DStructure extends PopulationStructure { public int xsize=1; public int ysize=1; public int zsize=1; } ... and so on for all the classes / parameters of your model. Uses a restricted subset of Java as schema definition language: int, double, String Set, Ref, List classes and inheritance namespaces Do code modules / embedded scripts have a place in NeuroML? Useful for quickly coding loops for running simulations, ad hoc connectivity... but perhaps having any code in the model spec defeats the object? State of play simulators adopting own XML formats for serialising model descriptions common standards working where the domain is stable (SBML, MorphML) The next steps? How much standardisation is useful? Just XML in any format? XML with uniform mapping from classes to <tags>? A set of rigid standards for compartmental neurons, channels, receptors, networks, ...? What features are needed from a development kit? C++, python, java? NeuroML for Model Specification in ChannelDB and GENESIS Dave Beeman University of Colorado, Boulder WAM-BAMM*05 The Problem: One neuronal model --> Many implementations EXAMPLE: Hodgkin-Huxley K channel Equations with parameter values describe the model. Simulator scripts tell the simulator how to implement it. Differences in simulator design --> NEURON and GENESIS scripts look very different --> Very difficult to convert a script to one for a different simulator The Solution: Establish a standard format for a declarative representation, NOT a simulator-dependent procedural representation. Hodgkin-Huxley K Channel Model Possible Representations ● Represent the equations in a form that can be parsed into Java ● Store tabulated values of rate variables ● Use parameterized form (A + BV) / (C + D exp((E + V)/F)) The ChannelDB Solution (http:/www.modelersworkspace.org/channeldb/ChannelDB.html) XML representation of a Java Hodgkin-Huxley object with attributes for Gmax, and a set of gates and their exponents ● Gate objects have attribute telling if it depends on voltage or concentration, and objects for the forward and backward rate parameters ● NeuroML development parser (http://www.neuroml.org) converts between XML representation and Java objects ● Use simple Java string manipulation commands to produce a simulation script from information in the fields of the DBChannel object ● Prototype database and interface creates commented GENESIS scripts from stored XML channel descriptions ● NeuroML representation of the Hodgkin-Huxley K channel <neuroml class="DBChannel" description="Hodgkin-Huxley squid K channel" author="Dave Beeman" keywords="Hodgkin-Huxley potassium squid delayed rectifier" uniqueID="10262778758662F22@dogstar.colorado.edu" notes="An implemention of the GENESIS K_squid_hh channel" Erest="-0.07V"> <channels> <channel name="K_squid_hh" class="HHChannel" permeantSpecie="K" Erev="0.09V" Gmax="360.0S/m^2" ivlaw="ohmic"> <gates> <gate name="X" class="HHVGate" timeUnit="sec" voltageUnit="V" vmin="-0.1" vmax="0.05" instantCalculation="false" useState="false" power="4"> <forwardRate class="ParameterizedHHRate" A="-600.0" B="-10000.0" C="-1.0" D="1.0" E="0.060" F="-0.01"/> <backwardRate class="ParameterizedHHRate" A="125.0" B="0.0 C="0.0" D="1.0" E="0.07" F="-0.08"/> </gate> </gates> <log author="Dave Beeman" date="Jul 9, 2002 11:11:15 PM" literatureReference="A.L. Hodgkin and A.F. Huxley, J. Physiol. (Lond) 117, pp 500-544 (1952)"> <logEntries> </logEntries> </log> </channel> </channels> </neuroml> Some classes defined for ChannelDB DBChannel: Wrapper class that is used to contain any channel model that is stored in ChannelDB, along with some descriptive information. HHChannel: Class used for all the Hodgkin-Huxley type channels in the database. HHVGate: Used as a member of the gates set of a HHChannel. It contains forward and backward rate objects that depend on voltage, as well as some additional fields to describe the gate. HHCGate: An ionic concentration-dependent gate, analogous to the voltage-dependent HHVGate. It provides an additional field for a reference to the object that provides the source of the ionic concentration. HHRate: The superclass for the specialized forms for the rate variables. ParameterizedHHRate: A subclass of HHRate that expresses rate variables in a parameterized form typical of many Hodgkin-Huxley type rate equations, "rate = (A + BV) / (C + D exp((E + V)/F))" EquationHHRate: A subclass of HHRate that expresses the rate variables as equations. TabulatedHHRate: A subclass of HHRate that allows a gate's forwardRate or backwardRate to be specified by a table at equally spaced voltage (or concentration) points. ConcenPool: Describes a single shell model for a concentration pool, with a buildup of concentration proportional to an incoming current and a time constant for decay. The object providing the source of concentration to a HHCGate is typically formed from this class. The source of currents is provided by a set of objects of class CurrentSource. CurrentSource: Used by ionic concentration pools to provide information about the object that provides an ionic current. Unfinished Business and Open Questions Extend NeuroML to provide representations for more detailed multi-shell models of calcium diffusion Implement a more sophisticated representation of literature references than the simple string that is currently used in the NeuroML software. (We have proposed a schema for the Modeler's Workspace based on BibTeX.) Software to convert ChannelDB descriptions to NEURON and other simulators Implement the HHCVGate, a two-dimensional gate depending on both voltage and concentration. (Note that the Traub Ca-dependent K channel model uses a form that can be expressed as a product of a HHVGate and a HHCGate.) Implement Borg-Graham or Lytton-Sejnowski temperature-dependent channel models with the NeuroML ThermodynamicHHVGate. Is there a better way for a concentration-dependent channel model to reference the models that provide the source of ionic currents and concentrations? How much standardization should there be for the format and the names of the independent variables and parameters in equation representations? Unbundling GENESIS GENESIS 3 Core – Based on MOOSE The Messaging Object Oriented Simulation Environment a reimplementation of GENESIS base code in C++ by U. S. Bhalla, NCBS, Bangalore Provides: Improved Messaging between GENESIS objects ● ● Faster, smaller, cleaner implementation ● Portable to MS Windows and non-UNIX platforms ● Improved equation solvers ● Allows multiple parsers and interfaces GENESIS 3 will add: Graphical interface ● XML representation of models ● Backwards compatibility with GENESIS 2 ● Tutorials and educational applications ● Planned GENESIS 3 Interfaces An XML Application for Neuronal Morphology Data http://www.morphml.org Sharon Crook Arizona State University Department of Mathematics and Statistics School of Life Sciences WAM-BAMM*05 MorphML XMLSpy Documentation WAM-BAMM*05 MorphML XMLSpy Documentation WAM-BAMM*05 MorphML: A Simple Example from neuroConstruct <?xml version="1.0" encoding="UTF-8"?> <n:morphml xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:n="http://morphml.org/morphml/schema/1.0.0" xsi:schemaLocation="http://morphml.org/morphml/schema/1.0.0 http://math.la.asu.edu/~crook/morphml/MorphML.xsd"> <n:name>SimpleCell</n:name> <n:notes>A Simple cell for testing purposes</n:notes> <n:lengthUnits>Micrometers</n:lengthUnits> <!--Converting cell: SimpleCell--> <n:points> <!-Start point of segment: Soma, ID: 0--> <n:point> <n:id>0</n:id> <n:x>0.0</n:x> <n:y>0.0</n:y> <n:z>0.0</n:z> <n:diameter>16</n:diameter> </n:point> <!-End point of segment: Soma, ID: 0--> <n:point> <n:id>1</n:id> <n:x>0.0</n:x> <n:y>0.0</n:y> <n:z>0</n:z> <n:diameter>16</n:diameter> </n:point> <!-Start point of segment: mainDend1, ID: 1--> <n:point> <n:id>2</n:id> <n:x>0.0</n:x> <n:y>0.0</n:y> <n:z>0.0</n:z> <n:diameter>2</n:diameter> </n:point> WAM-BAMM*05 MorphML: A Simple Example from neuroConstruct <!-End point of segment: mainDend1, ID: 1--> <n:point> <n:id>3</n:id> <n:x>-10.0</n:x> <n:y>-30.0</n:y> <n:z>0</n:z> <n:diameter>2</n:diameter> </n:point> <!-Start point of segment: mainDend2, ID: 2--> <n:point> <n:id>4</n:id> <n:x>0.0</n:x> <n:y>0.0</n:y> <n:z>0.0</n:z> <n:diameter>2</n:diameter> </n:point> <!-End point of segment: mainDend2, ID: 2--> <n:point> <n:id>5</n:id> <n:x>10.0</n:x> <n:y>-30.0</n:y> <n:z>0</n:z> <n:diameter>2</n:diameter> </n:point> <!-Start point of segment: mainAxon, ID: 3--> <n:point> <n:id>6</n:id> <n:x>0.0</n:x> <n:y>0.0</n:y> <n:z>0.0</n:z> <n:diameter>2</n:diameter> </n:point> WAM-BAMM*05 MorphML: A Simple Example from neuroConstruct <n:cells> <n:cell> <n:name>SimpleCell</n:name> <!-Segments of the cell --> <n:segments> <!-Segment: Soma, ID: 0--> <n:segment> <n:id>0</n:id> <n:proximal>0</n:proximal> <n:distal>0</n:distal> </n:segment> <!-Segment: mainDend1, ID: 1--> <n:segment> <n:id>1</n:id> <n:proximal>2</n:proximal> <n:distal>3</n:distal> <n:parent>0</n:parent> </n:segment> <!-Segment: mainDend2, ID: 2--> <n:segment> <n:id>2</n:id> <n:proximal>4</n:proximal> <n:distal>5</n:distal> <n:parent>0</n:parent> </n:segment> <!-Segment: mainAxon, ID: 3--> <n:segment> <n:id>3</n:id> <n:proximal>6</n:proximal> <n:distal>7</n:distal> <n:parent>0</n:parent> </n:segment> WAM-BAMM*05 MorphML: A Simple Example from neuroConstruct <!-Segment: subAxon1, ID: 4--> <n:segment> <n:id>4</n:id> <n:proximal>8</n:proximal> <n:distal>9</n:distal> <n:parent>3</n:parent> </n:segment> <!-Segment: subAxon2, ID: 5--> <n:segment> <n:id>5</n:id> <n:proximal>10</n:proximal> <n:distal>11</n:distal> <n:parent>3</n:parent> </n:segment> </n:segments> </n:cell> </n:cells> </n:morphml> WAM-BAMM*05 Virtual Ratbrain (http://www.ratbrain.org) Laszlo Zaborszky, Peter Varsanyi Center for Molecular and Behavioral Neuroscience, Rutgers Fred Howell, Nicola McDonnell Institute of Adaptive and Neural Computation, University of Edinburgh • • • Database for peer reviewed 3-D cellular anatomical data of the rat brain Visualization and analysis tools including analysis of dendritic and axonal morphometry Data stored in MorphML format WAM-BAMM*05 Virtual Ratbrain (http://www.ratbrain.org) MorphML Viewer WAM-BAMM*05 Building 3D Network Models with neuroConstruct (Summary of main presentation) Padraig Gleeson University College London p.gleeson@ucl.ac.uk WAM-BAMM*05 31 March 2005 Scope of Application ● Reuses existing base of models/modellers ● Adds functionality – Graphical interface – Checks on morphologies – Network building capabilities – Storage/replay/analysis of simulations ● Built with Java: runs on any platform ● Code produced is native GENESIS/NEURON Visualization ● Single Cells can be viewed in 3D ● Information on morphology ● Checks on consistency of cell structure ● Segments can be edited ● Info on basic electrophysiology Screenshot: Cell Visualization Packing in 3D ● ● Cell Groups are packed in 3D Regions – Rectangular Box – Spherical Various Packing Patterns – Random – Cubic close packed – Hexagonal – Single positioned – Evenly spaced in 1D Screenshot: Packing in 3D Simulator Interaction(1) ● ● Morphology files can be imported from – GENESIS (*.p readcell format files) – NEURON (most ntscable like files, with create, pt3dadd, connect) – Cvapp (SWC format files) – MorphML Imported cells are checked for validity: i.e. errors which may cause problems on some platforms – zero length segments – all except root segment have parents – unique names, etc. Simulator Interaction(2) ● ● Files can currently be exported to: – NEURON, for simulation – GENESIS, for simulation – MorphML, for publishing/use by other simulators Cell info held in simulator independent format – Can be mapped to other/future simulators Cell Processes ● ● Generic models of Cell Process (channels/synapses) can be used in neuroConstruct Model separated from experimentally measured parameters ● Reuse of tried and tested template files ● Can be mapped on to any simulator ● Automatic handling of units Modularity of Cell Processes (1) Pre-existing & well tested XML template of model of Cell Process, e.g. Double Exp Synapse, HH Channel Mapping of templates to existing simulators Experimentally determined parameter set: gmax, Tau Rise/Decay, etc. Published model of Cell Process (XML file) Modularity of Cell Processes (2) neuroConstruct Parameters coupled with GENESIS/ NEURON template placed on cells GENESIS/NEURON source code Published model of Cell Process Parameters with plotting module Plots of Cell Process internals Screenshot: Cell Processes Morphology mapping (1) ● neuroConstruct Concepts – Section: unbranched part of axon/dendrite with the same biophysical properties – Segment: Specifies one 3D point along Section, shaped like conical frustum – Section specifies start point, Segments specify 3D points along Morphology mapping (2) ● ● Going from GENESIS -> NEURON – Reasonably straightforward – Compartments in GENESIS mapped to segments in neuroConstruct Going from NEURON -> GENESIS – Non-trivial: mapping conical sections to cylinders – Simple mapping: each segment to compartment with equivalent surface area NeuroML/MorphML interaction ● neuroConstruct currently allows: – ● Import & export of MorphML morphologies Future support – Greater support for specification of groups/Cell Process locations in MorphML format – Importation of Cell Processes in NeuroML format – Export of network structure in NeuroML format – Generation of simulation code in NeuroML/NeoSim format XML for Model Specification: Introduction and Workshop XML for Model Specification: Discussion Wider/Easier Access to NeuroML: 1. Would a CVS server for materials on the website that allows for multiple contributors/editors be of use? 2. Would a separation of the NeuroML schema into several, more focused schemas (ex: morphology, channels, channel distributions, connectivity) be useful for making it more transparent and easier to use? If so, where do we define relationships among the language elements of each separate schema? 3. An XMLSpy HTML version of the specs might help with documentation. 4. Tight coupling of NeuroML with Java? What about other languages? XML for Model Specification: Discussion Channel Issues: 1. Can we use one XML channel specification for both NEURON and GENESIS specs? 2. How can we make XML channel documents easy to use for someone who is not a GENESIS or NEURON user? 3. Details of channel distributions in current NeuroML Schema, GENESIS, and NEURON? Can we define a schema that includes them all? XML for Model Specification: Discussion Morphology Issues: Mapping morphology across formats (MorphML, GENESIS, NEURON, BBT, SWC). 1. One issue is that GENESIS uses cylindrical compartments while others can be frustums. You can map between compartments of equal surface area which accounts for the membrane parameters but not the axial resistance. 2. In Neuron, the 3-D points describing a cable/section and the actual numerical integration points (nseg points evenly spaced) are different. 3. Segment of length zero for cell body? 4. MorphML->NEURON->MorphML connectivity? XML for Model Specification: Discussion Future Development: 1. Relation to other XML applications? BrainML, CellML and SBML? 2. Some of the inbuilt XML features in Java can be useful and further schema development should take advantage of these (ex: XML support in Java J2SE 5.0). 3. What concrete plans do different groups have over the next year for XML-related developments? The End