CARMEN: Code Analysis, Repository and Modelling for e-Neuroscience Research Challenge Understanding the brain may be the greatest informatics challenge of the 21st century Worldwide >100,000 neuroscientists (~ 5,000 in UK) are generating vast amounts of data Principal experimental data formats: molecular (genomic/proteomic) neurophysiological (time-series electrical measures of activity) anatomical (spatial) behavioural Neuroinformatics concerns how these data are handled and integrated, including the application of computational modelling Neuroinformatics In recent years new technological opportunities for data sharing have emerged with faster networks, improved database technologies, and affordable massive data storage capabilities Neuroinformatics is increasingly exploiting these opportunities to enable data sharing, re-use of data and novel analysis based on new combinations of data that can be performed via database systems Need for Cooperation Understanding the brain may be the greatest informatics challenge of the 21st century OECD identified a need to work cooperatively in order to achieve major advances and have established the International Neuroinformatics Coordinating Facility Cooperation will permit: development of common processes best value from data – long term curation ‘mega-analysis’ of large data sets integration of data sets across different scales and different approaches interdisciplinary research Potential Barriers to Cooperation Technical Multiple proprietary data formats Need for detailed, standardised and evolvable metadata Volume of the data to be analysed Cultural Multiple communities each acting independently Concerns about the consequences of sharing data Difficulty in appreciating how the science could be moved forwards by e-Science CARMEN – Focus on Neural Activity Understanding the brain may be the greatest informatics challenge of the 21st century raw voltage signal data is collected using single or multi-electrode array recording novel optical recording, particularly the activity dynamics of large networks resolving the ‘neural code’ from the timing of action potential activity neurone 1 neurone 2 neurone 3 Electrophysiological Data Much current knowledge about brain function is based on analysis of firing patterns of individual neurones. New computer-based data acquisition systems and techniques for recording simultaneously from many neurones means data are amassing rapidly. Neural modelling generates massive simulated data sets that need to be processed, analysed and compared with experimental data. Neuronal recordings can be intra- or extra-cellular recordings of single spikes, ensembles of neurones, or field potentials. All of these data are types of time-series data which require a specialised information handling system. CARMEN Objectives To demonstrate and sustain advances in neuroscience enabled by e-Science technology To create a grid-enabled, real time ‘virtual laboratory’ environment for neurophysiological data To develop an extensible, client-defined ‘toolkit’ for data extraction, analysis and modelling To provide a repository for archiving, sharing, integration and discovery of data To achieve wide community and commercial engagement in developing and using CARMEN Project Exemplar Recording from brain tissue removed from epileptic patients (scarce tissue and data rates up to 20 GB/h) On line analysis by distributed collaborators will enable experiment to be defined during data collection Repository will enable integration of rare case types from different laboratories New knowledge will lead to advances in treatment CARMEN Consortium Newcastle: Colin Ingram Paul Watson Stuart Baker Marcus Kaiser Phil Lord Evelyne Sernagor Tom Smulders Miles Whittington Cambridge: Stephen Eglen York: Leicester: Rodrigio Quian Quiroga Imperial: Simon Schultz Stirling: Jim Austin Tom Jackson Warwick: Jianfeng Feng Sheffield: Kevin Gurney Paul Overton Manchester: Stefano Panzeri Leslie Smith St. Andrews: Anne Smith Plymouth: Roman Borisyuk CARMEN Consortium Commercial Partners - applications in the pharmaceutical sector - interfacing of data acquisition software - application of database infrastructure - commercialisation of analysis tools Work Packages WP1 Spike Detection & Sorting WP 3 Data-Driven Parameter Determination in ConductanceBased Models WP2 Information Theoretic Analysis of Derived Signals Data Storage & Analysis WP5 Measurement and Visualisation of Spike Synchronisation WP4 Intelligent Database Querying WP6 Multilevel Analysis and Modelling in Networks CARMEN Structure Hub and Spoke Project Hub: A “CAIRN” repository for the storage and analysis of neuroscience data Spokes: A set of neuroscience projects that will produce data and analysis services for the hub, and use it to address key neuroscience questions e-Science Challenges Managing vast amounts of data > 50TB primary data Extracting value from the data discovery & interpretation analysis – harnessing compute resources curation of services as well as data Controlling access to the data & services CARMEN Active Information Repository Node OMII/ myGrid: Taverna/ BPEL OGSA-DAI & SRB Web Portal DAME: Signal Data Explorer Rich Clients ....... Web Portal Gold: Role & Task based Security OMII: Grimoire Security ....... Workflow Enactment Engine Compute Cluster on which Services are Dynamically Deployed Data myGrid Metadata Registry Service Repository & Gold: Feta, Provenance Dynasoar White Rose Grid Newcastle Grid A Typical Scenario we want to Support • Data Collection from Electrode Array • Spike Detection • with User Defined Threshold • Spike Sorting • Analysis • Visualisation Currently, this is a semi-manual process We have an initial prototype for automating this…. Signal Data Explorer Example Workflow Example Workflow Enactment External Client Workflow Engine BPEL / TAVERNA Repository INPUT Data Spike Sorting Service Security Available Services SRB FileSystem Registry Reporting Query RDBMS Dynamically Deployed Services in Dynasoar OUTPUT Metadata Example Graph Output Example Movie Output Some Remaining Challenges Extensible, standardised metadata for neuroscience data formats (timing, data channels, etc.) experimental design (e.g. stimuli or drug treatments) concurrent data (e.g. behaviour, physiological measures) experimental idiosyncrasies (e.g. artifacts) experimental conditions (animals, temperature, treatments etc.) Some Remaining Challenges (cont.) Locating patterns in time-series data across multiple levels of abstraction Reproducible e-Science curating services as well as data public repositories of deployable services dynamic service deployment Real-time expert collaboration CARMEN CARMEN is delivering an e-Science infrastructure that can be applied across a range of diverse and challenging applications (not only neuroscience) CARMEN enables cooperation and interdisciplinary working in ways currently not possible CARMEN will deliver new results in neuroscience, computer science and medicine