Building and running large scale computational neuroscience models Fred Howell Abstract

advertisement
Building and running large scale computational
neuroscience models
Fred Howell and Jean-Alain Grunchec
Institute for Adaptive and Neural Computation, Informatics, University of Edinburgh
Abstract
The ambitious aim of computational neuroscience is to gain a better understanding of the operation
of real neurons and brain circuits. But real neurons are extremely complicated electronic and
biochemical devices, whose dynamics we still have a poor understanding of, and they are
connected in extremely large, complex and specific networks, whose connectivity and dynamics we
have even less understanding of.
We are working on three of the software problems associated with modeling brain networks;
obtaining access to sufficient experimental data to build models; developing GUIs and standards for
declarative XML based model specifications, containing all parameters of multi-level models of
neurons, ion channels and networks; and automated techniques for scaling large models across
clusters without the modeller having to become a parallel programming expert.
1. Introduction
The first major success of computational
neuroscience was the phenomenal work of
Hodgkin and Huxley, in 1952, who succeeded
in producing an elegant mathematical model of
the propogation of the electrical signal in a
nerve axon - the action potential.
More recent work has extended the
electrical study of the membrane properties to
include large numbers of different ion channels
(the ``transistors'' of the neuron), receptors and
synapses. Software tools such as NEURON [5]
and GENESIS [6] include scripting languages
for building arbitrarily complex models of
electrical activity over the neural membrane
(similar to SPICE models in analogue
electronics) incorporating the branched shape of
dendrites, ion channel switching characteristics,
and intracellular biochemical pathways.
Modellers would like to be able to scale up
models to include a fraction of the connectivity
of a real circuit. It is the number of synapses
which usually limits the scale of model which
one can run; a small area of brain tissue may
have 10,000 neurons, each connected to 10,000
others, i.e. 100,000,000 synapses. One may
wish to model the dynamics of each synapse by
a number of state variables, which makes
modelling even a small area of brain networks
intractable.
In this paper we present our work addressing
three of the escience challenges related to
modelling brain circuits: obtaining access to
experimental data; building multi level model
descriptions; and scaling models to run across
clusters.
2. Data issues in neural modeling
The single biggest challenge limiting the
development of working models of neural
circuits is the lack of availability of suitable
experimental data. Ideally one would like to
know the precise shapes of all neurons in a
given circuit (since morphology is one
determinant of electrical behaviour), along with
positions and shapes of all synapses, electrical
recordings taken concurrently from all cells,
and localisation of the electrically active
receptors and channels across the membrane.
However it is not yet feasible to gather
detailed connectivity information, or detailed
electrical recordings from more than a few cells
at once – although exciting technical
developments in high throughput electron
microscope level tomography [10] and
simultaneous optical recording techniques from
hundreds of cells using calcium imaging and
confocal microscopes promise orders of
magnitude improvements in this area. Such
techniques will require novel high throughput
image analysis algorithms to generate useful
data. Truly large scale and systematic data
collection to allow us to model small brain
circuits will require an industrial activity in the
manner of the human genome project. [11]
Fig 1 The network editor gui is used for specifying large networks of neurons. The GUI for
adjusting parameters for different compnents of the model is generated on the fly from the
schema, and the model is stored in XML.
We have developed a number of tools for
data before an article would be accepted for
publishing experimental data including the
publication. Funding council moves towards
ratbrain project [9] and Axiope Catalyzer [12].
requiring data publication will also help.
Particular computer science challenges are
caused by the heterogeneity of data types. In
the extreme, each experiment could require its
3. XML for model description
own object oriented data model. Ambitious
We are collaborating with the developers of
attempts to construct data models for biological
the
major software tools for neuroscience
experiments [e.g. MAGE-OM and MAGE-ML]
models
(including NEURON and GENESIS) to
illustrate the complexity, with hundreds of
move
the declarative aspects of model
related classes for describing a subset of the
specification
to simulator independent XML
restricted domain of a single experimental
format.
Models
in neuroscience can be complex
technique (microarrays). Neuroscience research
and
multi-level.
They can combine elements of
requires new usable software which bridges the
intracellular
pathways
(the focus of the systems
gap between unstructured text and structured
biology
standard
for
pathways,
SBML [4]); ion
databases, and which is not so complex that its
channels and receptors; compartmental neurons
use requires ontology specialists.
and networks. At each level of scale, there are
We emphasise this data issue in a paper on
different levels of detail suitable for modeling,
software for modeling as it is currently more of
e.g. A single neuron may be modelled as a
a limiting factor to progress than the other
single isopotential comparttment, or as a
significant technical issues such as model
complex tree structure; a synapse may be
scaling or parameter searching.
modelled as a simple weight, a dynamical
Databases are not yet in widespread use in
system, or as a complex pathway incorporating
neuroscience, and there has not been enough
calcium buffers and states of receptors.
encouragement for researchers to publish their
Because of this complexity, it is important
raw datasets alongside journal articles. Our
for the description of the model to be as clean as
suggestion is to learn from the successes of data
possible,
and
separated
from
the
publication
in
bioinformatics,
which
implementation concerns of a particular
established useful community databases by the
simulator. The current situation is that many
simple mechanism of journals requiring an
models are coded using a script language, which
accession number in a public database of raw
provides for convenient automation for running
Fig 2 The dynamic state of a running simulation can be viewed from the workstation. This allows
graphs, animations and summary statistics for the model running on a cluster to be monitored.
models but often means that the model can only
run on a single simulator.
Simulator developers are keen to move
towards language independent XML based
standards for model descriptions, as XML is
sufficiently flexible and clear for people and
programs to read, and copes with arbitrarily
complex structures. We established the
NeuroML project [1] as a focal point for tools
standards efforts in neuroscience modelling.
For areas of modelling where the methods
and data structures are agreed and stable it
makes sense to move towards standard
languages. One example is the MorphML [13]
standard
for
describing
the
complex
morphology of reconstructed neurons and
populations. This is useful for neuroanatomists
as well as modellers. Another example is the
ChannelML project for standardising models of
ion channel dynamics. Developer meetings and
open source resources provided by Sourceforge
[1]
have proved useful for coordinating
activity. The NeuroConstruct tool [3] developed
by our collaborators in UCL provides support
for creating declarative model descriptions
focused on cerebellar network models.
But there remains the question of how
simulators represent aspects of novel models for
which there is no agreed standard. It would be
advantageous for these parts of model
descriptions to be stored in XML as well, with a
simulator-specific schema.
3.1 Developing schema-independent GUIs
One consequence of the complexity and
dynamic nature of neuroscience models is the
software development of overhead of recoding
user interfaces with every change in schema.
Because of this, we made a model editor which
builds itself on the fly from the schema
definition of the object model, so no coding is
required to extend the interface or add
additional parameters. This technique has
proved extremely powerful and time saving,
and allows one to edit arbitrary XML formats
which provide an xml version of their
underlying object model. Figure 1 illustrates the
interface. Similar techniques have been used in
a small number of tools [7,8,12].
4. Scaling models to run across
clusters
One consequence of moving model
specifications from a script language to a
declarative XML notation is that it becomes
possible to run models across a number of
machines, to allow scaling to more realistic
sizes, without having to recode the model as an
explicitly parallel program.
This is useful for network models, as the
number of synapses can be large.
4.1 Dynamic modules
One of the challenges of neuroscience
models is that the types of model people want to
build are constantly changing. In order to avoid
having to rewrite our software with every new
demand, we adopted a system of dynamic
modules, with new simulation and GUI
components loaded on the fly (using javabeans).
The models can be run locally or
automatically distributed across a cluster with
visualisation streamed to the workstation. One
of the 3D visualisation modules we have written
allows the user to display the voltage activity of
any neuron in the simulation, and also
population activity statistics.
•
•
using a variable length integer encoding
scheme to reduce memory usage per integer
from four bytes to one. The encoding
scheme we use places values in the bottom 7
bits of a byte, so small numbers (0-127) take
a single byte of space. The high bit is set if
additional bytes are needed to store larger
numbers. The technique is similar as that
used by the Apache Lucene text index
software for its high performance.
using on-the fly in memory compression of
connectivity tables. Standard techniques
including run length encoding and
difference encoding are used, and these are
particularly effective in conjunction with the
variable length integers.
4.2 Parallel algorithms
We use the Java remote method invocation
(RMI) layer for communications between
nodes. This high level interface has the
flexibility of automatically serialising objects,
so once the basic communications layer is in
place one can send arbitrarily complex
structures between nodes (e.g. the shape and
voltage distribution of a neuron) without having
to write custom messaging code. The flexibility
comes at a performance cost, however. The
built-in java object serialisation is fairly slow.
For the communications during a simulation run
(voltage spikes between neurons) we developed
our own optimised communications based
directly on low level sockets. This allows us to
reduce memory copying and approach the
maximum performance of the underlying
hardware.
4.3 Java optimisation techniques
We chose to develop our software using
Java, because of its convenience and the
availability of useful libraries for GUIs. In order
to obtain reasonable performance using Java,
we found it necessary to be extremely careful in
selecting which Java features to use.
Optimisations we found to work particularly
well were:
• using arrays of primitive types in
preference to objects. For example, rather
than having a “Connection” class with a
separate object for each connection, use a
“Connections” table which holds arrays of
integers or bytes. This allows the just-intime compiler to reach C++ levels of
performance and memory efficiency at a
cost in programming convenience. When C#
support on Unix reaches maturity it may be
possible to combine convenience and
efficiency.
With these optimisations one can obtain
performance comparable to native code with
fewer cross platform issues. Memory use per
connection is 10 bytes, and the communications
overhead is proportional to the logarithm of the
number of nodes. We have been running
simulations with 100 million synapses per node
on our 24 processor cluster, and are extending
the package to run across clusters in Edinburgh
and UCL. The software will be released on our
NeuroGEMS site [2].
References
[1] Howell et al, NeuroML home page:
www.neuroml.org
[2] NeuroGEMS neuroinformatics modules
www.neurogems.org
[3] P Gleeson and A Silver, Neuroconstruct,
http://www.physiol.ucl.ac.uk/research/silver_a/n
euroConstruct/index.html
[4] SBML, www.sbml.org
[5] M Hines, NEURON, www.neuron.yale.edu
[6] GENESIS, www.genesis-sim.org
[7] R. Cannon, Catacomb, http://www.enorg.org
[8] PEDRO, pedro.man.ac.uk
[9] L Zaborsky, F Howell, N McDonnell, P
Varsanyi, (2005) The Ratbrain Project,
www.ratbrain.org
[10] Denk W, Horstmann H (2004) Serial
Block-Face Scanning Electron Microscopy to
Reconstruct
Three-Dimensional
Tissue
Nanostructure. PLoS Biol 2(11): e329
[11] R Merkle (1989) Large scale analysis of
neural structures, www.merkle.com/merkleDir/
[12] Howell, Cannon et al (2004), Catalyzer,
www.axiope.com
[13]
S
Crook
(2004),
MorphML,
www.morphml.org
Download