Jim Austin , Colin Ingram , Leslie Smith, Paul Watson , Stuart
Baker, Roman Borisyuk, Stephen Eglen, Jianfeng Feng, Kevin
Gurney, Tom Jackson, Marcus Kaiser, Phillip Lord, Stefano
Panzeri, Rodrigo Quian Quiroga, Simon Schultz, Evelyne Sernagor,
V. Anne Smith, Tom Smulders, Miles Whittington.
eScience Meeting, Edinburgh, November 2006.
Slide 1
The CARMEN Project
CARMEN is a new e-Science Pilot Project, (UK research council funded) in Neuroinformatics.
Objectives:
• To create a grid-enabled, real time ‘virtual laboratory’ environment for neurophysiological data
• To develop an extensible ‘toolkit’ for data extraction, analysis and modelling
• To provide a repository for archiving, sharing, integration and discovery of data
To achieve wide community and commercial engagement in developing and using CARMEN
CARMEN is a 4 year project: if it is to last longer, it must become financially selfneurone 1 sufficient.
See http://www.carmen.org.uk
neurone 2 neurone 3 eScience Meeting, Edinburgh, November 2006.
Slide 2
Leadership & Infrastructure
Colin Ingram
Paul Watson
Leslie Smith eScience Meeting, Edinburgh, November 2006.
Jim Austin
Slide 3
Work Packages
University of
The
University
Of
Sheffield eScience Meeting, Edinburgh, November 2006.
Slide 4
Background: What is Neuroinformatics?
Informatics applied to Neuroscience (of all sorts)
Experimental Neuroscience:
Data recording, data analysis have used computers for a long time.
But a great deal more can be achieved by pooling data and analysis services
Cognitive and Computational Neuroscience
Modelling,
Matching models to more experimental data
Matching models to known appropriate behaviour
Defining and running more sophisticated models
Running models in real time
Clinical Neuroscience
Data-based understanding of neuropathology
Neuropharmaceutical assays and assessment eScience Meeting, Edinburgh, November 2006.
Slide 5
What is Neuroinformatics bringing to Experimental Neuroscience?
Getting leverage from e-Science capabilities to allow better use of data.
Example: Dataset re-use:
Experimenter does experiment, records data, analyses data, writes the paper, perhaps makes the data available to a small number of colleagues.
…and then?
The dataset languishes, first on a spinning disk, then later on some DVD’s, then later still, is lost to view, as the experimenter changes lab,…
Yet the data could be of use to other researchers… eScience Meeting, Edinburgh, November 2006.
Slide 6
What are the basic problems holding back dataset re-use? (1)
Two major technical problems: Data format, and Metadata
Data Format
There are different systems for Neuroscience data collection.
The data format is a particular structure
The structure may be
Proprietary: defined by a particular piece of software, and not made public
Locally generated: defined by a locally written piece of software, but not necessarily well documented
Public, but no suitable converter exists for the intending user eScience Meeting, Edinburgh, November 2006.
Slide 7
What are the basic problems holding back dataset re-use? (2)
Metadata problems
The data itself is useless unless the re-user knows exactly what the data represents.
(Presumably the experimenter knew)
But did they record this information in an accessible way?
Metadata is data about the dataset
How was it generated?
What were the experimental conditions?
What was the culture, or what preparation, or what animal,…?
What was the temperature of the recording? Etc. etc.
If the data is to be readily re-used these metadata problems need to be solved in a directly usable way
Simply describing the protocol in English is not enough
Can’t automate reading J. Neurosci yet!
There needs to be an automatically processable way of describing the experimental protocol.
Particularly true is datasets are used for a large-scale survey of data e.g. for data-mining.
eScience Meeting, Edinburgh, November 2006.
Slide 8
Enabling Neuroinformatics based collaboration
Solving data format problems
Force users to adopt a common format?
Alienates users: they won’t do it unless they can see real benefits
Support documented formats
Adopt a common internal format, providing translators to
& from this format
Rely on proprietary format owners to come aboard because of customer pressure eScience Meeting, Edinburgh, November 2006.
Slide 9
Enabling Neuroinformatics based collaboration
: solving metadata problems
Difficult problem: there are a number of attempts at solving it:
BrainML: (Cornell) brainml.org
“BrainML is a developing initiative to provide a standard XML metaformat for exchanging neuroscience data. It focuses on layered definitions built over a common core in order to support community-driven extension.”
NeuroML: http://www.neuroml.org/
“NeuroML is an XML-based description language for defining and exchanging neuronal cell, network and modeling data including reconstructions of cell anatomy, membrane physiology, electrophysiological data, network connectivity, and model specification”
Relevant not only for Neuroinformatics and experimental neuroscience:
Part of a cross-cutting problem for all aspects of neuroscience.
eScience Meeting, Edinburgh, November 2006.
Slide 10
Solving Metadata problems continued:
As well as metadata systems for neuronal systems, there are related metadata systems which can be used by BrainML and NeuroML,
ChannelML: for defining ion channel models
MorphML: for defining the morphology of a neuron
SBML: Systems Biology markup language: models of biochemical reaction networks
CellML:to store and exchange computer-based mathematical models
SBML is particularly well advanced: see http://sbml.org/index.psp.
MathML: for describing mathematical notation and capturing both its structure and content. See http://www.w3.org/Math/
Metadata is a big but soluble problem.
It is a multi-level problem, but the systems above provide a multi-level solution.
eScience Meeting, Edinburgh, November 2006.
Slide 11
Enabling Neuroinformatics based collaboration: Sociological problems
There is a reluctance to permit re-use amongst some experimental neuroscientists.
What do experimental neuroscientists get from allowing others to reuse their data?
If the answer is only better science, then some experimental neuroscientists will not come on board.
They need to be convinced sharing that their hard-earned datasets will be of benefit to them
Names on papers?
The ability to be involved in the further research?
At the very least, some credit!
Some neuroscientists fear that their data will be used without their knowledge
There is therefore some reticence amongst the experimental neuroscience community.
eScience Meeting, Edinburgh, November 2006.
Slide 12
Solving sociological problems
There are technical aspects to solutions:
Security aspects on the holding of data:
Ensure that datasets can be secured: for example that they can only be re-used with the experimenter’s permission.
Security is critically important for holding of data which is still being analysed prior to publication.
…and non-technical aspects too
Bringing experimental neuroscientists on board
Ensuring that the Neuroinformatics community is properly crossdisciplinary, with good representation from the experimentalists.
Getting journals on-side
Many journals are demanding that raw/processed data be made available in order to check results.
eScience Meeting, Edinburgh, November 2006.
Slide 13
Clinical Neuroscience is about treatment of
Mental illness
Brain diseases
Trauma
Neuroinformatics has major application here, ranging from 3d imaging technologies to EEG recordings: much broader than the focus of
CARMEN.
QuickTi me™ and a
TIFF ( Uncompressed) decompressor are needed to see thi s pi ctur e.
CARMEN is primarily concerned with neural recordings.
QuickTi me™ and a
T IFF (Unc om pres s ed) dec om pres s or are needed to s ee t his pic t ure.
These can provide data on neurochemical effects on neural function.
Overall brain states (mental illness, disease) are believed to originate in the neurochemistry (research in depression and schizophrenia suggests this).
eScience Meeting, Edinburgh, November 2006.
Slide 14
Neural cell cultures
Different types:
Slice preparations
Cultures grown from neural cell lines
Cultures from neonate neurons
…have recordings made from them with and without added neuropharmaceuticals.
Interest is on changes in behaviour in these preparations.
Requires instrumentation and analysis techniques
Sharing these results can lead to major advances
Pharmaceutical companies are interested
Security implications eScience Meeting, Edinburgh, November 2006.
Slide 15
WP2 Information Theoretic
Analysis of Derived Signals
WP1 Spike Detection
& Sorting
WP 3 Data-Driven Parameter
Determination in Conductance-
Based Models
WP 0
Data Storage
& Analysis
WP4 Measurement and Visualisation of Spike Synchronisation
WP4 Intelligent
Database Querying
WP5 Multilevel Analysis and
Modelling in Networks eScience Meeting, Edinburgh, November 2006.
Slide 16
• Create ‘virtual laboratory’ for neurophysiological data
• Provide repository for:
• archiving, sharing, integration and discovery of data
• services that operate on the data
• Develop extensible ‘toolkit’ for data extraction, analysis and modelling
• Achieve wide community and commercial engagement in
CARMEN
• CARMEN must become financially self-sufficient after
4 years
Slide 17 eScience Meeting, Edinburgh, November 2006.
• Bowker’s “Standard Scientific Model” 1
1.
Collect data
2.
Publish papers
3.
Gradually loose the original data
1 The New Knowledge Economy and Science and Technology Policy,
G.C. Bowker, E1-30-03-05
• Problems:
• papers often draw conclusions from unpublished data
• inability to replicate experiments
• data cannot be re-used
Slide 18 eScience Meeting, Edinburgh, November 2006.
• Bowker’s Model
• Collect data
• Publish papers
• Gradually loose the original data
• Problems:
• papers often draw conclusions from unpublished data
• inability to replicate experiments
• data cannot be re-used
• Solution
• Data Repositories
• Computational Science
• Write codes
• Publish papers
• Gradually loose the codes
• Problems:
• papers often draw conclusions from the results of unpublished codes
• inability to replicate experiments
• codes cannot be re-used
• Solution
• Service Repositories
Slide 19 eScience Meeting, Edinburgh, November 2006.
CARMEN Active Information Repository
External
Client
Workflow
Enactment
Engine
Compute Cluster on which Services are Dynamically
Deployed
Data
Metadata
Registry
External
Client
Service
Repository eScience Meeting, Edinburgh, November 2006.
Slide 20
Dynamic Service Deployment - Dynasoar
Service Repository
2: service fetch & deploy
SR
R node 1 s2, s5 req node 2
1
3
C WSP res
Web Service
Provider node n s2
Host Provider
Slide 21 eScience Meeting, Edinburgh, November 2006.
eScience Meeting, Edinburgh, November 2006.
DAME developed a tool to analyse large volumes of distributed signal data
CARMEN will extend this to:
allow search and management of labelled data
link the search results to data descriptions to allow better ranking and data analysis
Slide 22
•
•
•
• “is there any data captured under conditions x, y & z?”
• “what services are available to process this spike train data?”
•
eScience Meeting, Edinburgh, November 2006.
Slide 23
•
•
•
•
•
Tool for locating patterns in time-series data across multiple levels of abstraction
Dynamic service provisioning over a grid
Extensible, standardised metadata for neuroscience
Fine-grained access control
Integrating data from multiple repositories eScience Meeting, Edinburgh, November 2006.
Slide 24
WP1 Spike Detection
& Sorting
WP2 Information Theoretic
Analysis of Derived Signals
WP 3 Data-Driven Parameter
Determination in
Conductance-Based Models
WP 0
Data Storage
& Analysis
WP4 Measurement and Visualisation of Spike Synchronisation
WP4 Intelligent
Database Querying
WP5 Multilevel Analysis and
Modelling in Networks eScience Meeting, Edinburgh, November 2006.
Slide 25
Spike Detection & Sorting (WP1: Stirling & Leicester)
Analogue recording (digitised)
Doesn’t fit!
Cluster 1 eScience Meeting, Edinburgh, November 2006.
Cluster 2
(Clustering using wave_clus)
Slide 26
CARMEN and spike detection and sorting
Idea is to provide many services
Several different types of spike detection algorithms
Several different types of spike sorting techniques
(including different types of data reduction, as well as different types of clustering)
Allow the user to test with a variety of techniques, and then choose the techniques they prefer
High speed links should allow immediate transfer of some datasets to Grid based systems
Allow experimentalist to choose near-real-time detection and sorting for immediate feedback
To assist during the experiment
Slower (and more effective) techniques for later analysis off-line.
Allow comparison of different techniques on a wide variety of data
Which is best, and for what?
eScience Meeting, Edinburgh, November 2006.
Slide 27
Information Theoretic Analysis of Electrically- and Optically-
Derived Signals (WP2: Imperial College, Manchester, and UCL)
Action potentials will be detected both electrically and optically.
Action potentials (spikes) are the primary electrical communication mechanism between neurons.
How can one interpret neuronal action potentials?
Information Theory , can be used to establish the ‘neuronal code’ quantifying how much information is carried by different potential neuronal coding mechanisms.
Issues:
Sampling problems, Spike correlation, Multimodal recording
By using Grid technology, we can assemble large quantities of optical and multi-electrode recordings and apply existing and novel techniques to its analysis.
We can make these techniques available as services.
eScience Meeting, Edinburgh, November 2006.
Slide 28
Data-Driven Parameter Determination in Conductance-Based
Models (WP3: Sheffield: Gurney et al)
Neuron modelling uses conductance based models.
[Many ionic species cross the neural membrane.
Ion channels embedded in the membrane accomplish this transport
There are many different ion channel types
Setting the parameters for each type would enable better understanding of neuron operation]
Determining parameters for neural models is difficult and requires a great deal of data.
The parameters are not constant, and vary with (e.g.)
Cell type
Presence and concentration of neuromodulators
Temperature
CARMEN aims to provide this volume of data, and hence to enable many of these parameters to be determined.
eScience Meeting, Edinburgh, November 2006.
Slide 29
Compartmental’ modelling of morphology
Real neural morphology approximation eScience Meeting, Edinburgh, November 2006.
(Passive) electrical equivalent
Slide 30
Model
50 mV
Data (Wilson)
Wood et al., Neurocomputing , 2003
100ms
100 ms
50mV eScience Meeting, Edinburgh, November 2006.
Slide 31
Measurement and Visualisation of Spike
Synchronisation (WP5: Newcastle, Plymouth)
More advanced analytical techniques are required to handle large scale, simultaneous recordings arising from MEAs.
Visualisation is critical to understanding what is happening
WP5 aims to:
•develop reliable and robust analysis techniques to address these issues, particularly sweeping statistical methods to test if measures show significant changes
•develop novel visualisation methods for displaying the results from these techniques, particularly those working in highdimensional space
•conduct real-time analysis of spike coding through a distributed Grid-enabled virtual laboratory.
Advanced (but fast) visualisation techniques are important to the whole community using CARMEN eScience Meeting, Edinburgh, November 2006.
Slide 32
Particle aggregation in gravitational clustering,
(Gerstein and
Lindsey(2006)). Each particle represents a cell; charges on each cells are incremented with each spike, and a force occurs between particles dependent on the charges.
eScience Meeting, Edinburgh, November 2006.
Slide 33
Multilevel Analysis and Modelling in Networks (WP6: Newcastle, St.
Andrews, Cambridge)
This WP aims to integrate the work of WP1-4, using the technology of WP0.
Understanding activity dynamics within neuronal networks is a major challenge in neuroscience requires simultaneous recording from large numbers of neurons.
This WP will provide
•integration of existing and novel network analysis techniques into
CARMEN in order to build comprehensive models of network dynamics
•data of exceptional quality and detailed provenance for the CARMEN repository for analysis of network properties
•development of new dynamic Bayesian network algorithms to trace paths of neural information flow in networks.
For example: waves of activity in early turtle retina, recorded using Ca++ sensitive dye. (Thanks to Evelyne Sernagor, ION, Newcastle University) eScience Meeting, Edinburgh, November 2006.
Slide 34
eScience Meeting, Edinburgh, November 2006.
QuickTime™ and a
Cinepak decompressor are needed to see this picture.
Slide 35
eScience Meeting, Edinburgh, November 2006.
Slide 36
Concluding remarks
CARMEN is a recent project (funding started October 2006).
The baseline support technology is still being assembled.
It’s not the first attempt at making neurophysiological recordings re-usable
But:
CARMEN will contain more than recordings
Services, workflows, capability of using multiple data formats
CARMEN builds on earlier e_Science projects
Re-use not re-invention
We have experimental neuroscientists, informaticians, and computational neuroscientists all on board
Tackling the broad range of issues from multiple perspectives.
eScience Meeting, Edinburgh, November 2006.
Slide 37