EPSRC e-Science Pilot Project in Integrative Biology David Gavaghan, Damian Mac Randal, and Sharon Lloyd Project Overview • Focus of first round of UK e-Science Projects – Data storage, aggregation, and synthesis – Life Sciences projects focused on supporting the data generation work of laboratory-based scientists • Key goal now is to turn this wealth of data into information that can be used to determine biological function • Requires an iterative interplay between experiment, mathematical modelling, and HPC-enabled simulation • Primary goal of this project is to build the necessary Grid infrastructure to support this goal The Science and e-Science Challenge • To build an Integrative Biology Grid to support applications scientists addressing the key postgenomic aim of determining biological function • To use this Grid to begin to tackle the two chosen Grand Challenge problems: the in-silico modelling of heart failure and of cancer. Two Grand Challenge Research Questions • What causes heart disease? • How does a cancer form and grow? • These two diseases together cause 61% of all deaths in the UK Courtesy of Peter Kohl (Physiology, Oxford) Normal beating Fibrillation Multiscale modelling of the heart MRI image of a beating heart Fibre orientation ensures correct spread of excitation Contraction of individual cells Current flow through ion channels Simulation of sudden cardiac death due to a mechanically induced impact applied during repolarisation Courtesy of: W.Li, P.Kohl, and N.Trayanova. J. Mol. Hist. 2004 (in press) Required 27 hours of CPU time on an SGI IRIX 64 Mathematical model of a beating heart by the Auckland Group Multiscale modelling of cancer An integrative approach to disease modelling? • The potential impact of this approach has been demonstrated by the work on modelling the heart • Time is ripe to extend to cancer: UK has extensive expertise but little has yet been done • Together the two application areas provide a sufficiently hard e-Science problem to require a generic solution • Methodology and infrastructure will be utilised across biology and in other scientific domains The scientific challenge Modelling and coupling phenomena which occur on many different length and time scales • 1m • 1mm • 1mm • 1nm Range = 109 person tissue morphology cell function pore diameter of a membrane protein • 109 s (years) • 107 s (months) • 106 s (days) • 103 s (hours) •1s • 1 ms • 1 ms Range = 1015 human lifetime cancer development protein turnover digest food heart beat ion channel gating Brownian motion Details of test-run of heart simulation code on HPCx • Modelled 2ms of electrophysiological excitation of a 5700mm3 volume of tissue from the left ventricular free wall • Noble 98 cell model used • Mesh contained 20,886 bilinear elements (spatial resolution 0.6mm) • 0.05ms timestep (40 timesteps in total) • Required 978s CPU on 8 processors and 2.5 Gbytes of memory • A complete simulation of the ventricular myocardium would require up to 30 times the volume and at least 100 times the duration • Estimated max compute time to investigate arrhythmia ~107s (~100 days) requiring ~100Gb of memory (compute time scales to the power ~5/3) • At high efficiency this scales to approximately 1 day on HPCx Key Deliverables • A robust and fault-tolerant infrastructure to support post-genomic research in integrative biology that is user and application driven • 2nd Generation Grid bringing together components across range of current EPSRC pilot projects The e-Science Challenge • To leverage the global Grid infrastructure to build an international “collaboratory” which places the applications scientist “within” the Grid allowing fully integrated and collaborative use of: – – – – – HPC resources (capacity and capability) Computational steering, performance control and visualisation Storage and data-mining of very large data sets Easy incorporation of experimental data User- and science-friendly access => Predictive in-silico models to guide experiment and, ultimately, design of novel drugs and treatment regimes e-Science/Grid Research Issues • Ability to carry out reliably and resiliently large scale distributed coupled HPC simulations • Ability to co-schedule Grid resources based on a GGFagreed standard • Use of Grid Services based on OGSA-DAI for data virtualisation • Secure data management and access-control in a Grid environment • Grid services for computational steering conforming to an agreed GGF standard e-Science/Grid Research (contd.) • Grid Services for supporting distributed collaborative working including steering and visualisation • An interface to using Grid resources which understands and supports effectively the science context of the project • The project also stretches the cross-disciplinary aspects of the Grid by linking medical, biological, engineering and computing activities. • The project is intending to produce a long term (~10 year) production environment based on the Grid to support what we expect to become a major scientific growth area. Architecture and Software Engineering • Initially use Web Services to provide a platform and language independent interface to the main functional components • Adopt Grid Services as stable open source OGSAcompliant implementations become available • Deploy an object-oriented component-based toolkit allowing a “plug-and-play” style programming paradigm • Use of Portal Technologies to provide collaborative access to services Architecture In silico whole organ modelling collaboratory Multiple collaborating users Plug-ins to User’s Desktop Environment Job Management Client Computational Steering Client Visualisation Control Client Data Finder Client Other Clients API Collaboration support service Front End Services Back End Services Job Management Service Computational Steering Service Data Finder Service Modelling & Simulation Services Visualisation Control Service Other Services Data Visualisation Services Data Managemen t Services Architecture In silico whole organ modelling collaboratory Multiple collaborating users Plug-ins to User’s Desktop Environment Job Management Client Computational Steering Client Visualisation Control Client Data Finder Client Other Clients API myGrid RealityGrid gViz CCLRC ICENI gHWLM Front End Services Job Management Service Back End Services Collaboration support service Computational Steering Service e-Diamond CCLRC Data Finder Service Modelling & Simulation Services GEODIS E OGSA-DAI CCLRC BioSimGrid Visualisation Control Service CCLRC gViz Other Services Data Visualisation Services Data Managemen t Services gViz CCLRC RealityGrid Technology Gaps that will be addressed Much of this work will be in conjunction with other EPSRC Pilot projects • Resilient, robust, reliable Grid framework for large scale distributed coupled simulations • Standardised Grid framework for computational steering and visualisation • Metadata schemas for describing the information and data resources involved • Standardised means to schedule multiple resources on the Grid concurrently • Tools for collaborative working in a Grid Services environment • Transparent Grid Project management • Building on extensive experience in other eScience projects (particularly e-DiaMoND) • Focus on team building and common goals (key for large, inter-institutional development projects) • Establishing good communication mechanisms • Iterative prototype development The Team • World-leading expertise in the two application areas • IBM • CCLRC • Seven UK and NZ Universities (Oxford, Nottingham, Leeds, UCL, Birmingham, Sheffield and Auckland) • Expertise from across the UK e-Science Programme • Extensive existing connectivity between all members of the consortium and with the wider research communities in e-Science and within the application areas • Research training in an area crucial to the UK The Resources • £2.44M from EPSRC e-Science to fund 10 PDRAs and 6 PhD students • A further 4 PhD students plus sys admin and secretarial support funded internally • Equivalent of 3FTEs from IBM plus substantial hardware discounts to provide a Power 4 server and high performance workstations to all project staff. • Use of Atlas Data store at RAL and substantial commitment of staff time by CCLRC • Large pool of expertise through the co-investigators in the seven partner universities, IBM and CCLRC • Extensive access to national HPC resources (HPCx and CSAR) Current Status • Award letter issued 26/9/03, agreed by University in late October, grant announced 26/10/03. • Project manager, project architect, six PDRAs, and one D.Phil student already appointed • Project Structure defined and agreed, requirements gathering and security policy exercises commenced • Recruitment of other staff in process • Kick off meeting of project participants held in Oxford on January 19th