Document 13345938

advertisement
EPSRC e-Science Pilot Project in
Integrative Biology
David Gavaghan, Damian Mac Randal,
and Sharon Lloyd
Project Overview
• Focus of first round of UK e-Science Projects
– Data storage, aggregation, and synthesis
– Life Sciences projects focused on supporting the data generation
work of laboratory-based scientists
• Key goal now is to turn this wealth of data into
information that can be used to determine biological
function
• Requires an iterative interplay between experiment,
mathematical modelling, and HPC-enabled simulation
• Primary goal of this project is to build the necessary Grid
infrastructure to support this goal
The Science and e-Science Challenge
• To build an Integrative Biology Grid to support
applications scientists addressing the key postgenomic aim of determining biological function
• To use this Grid to begin to tackle the two
chosen Grand Challenge problems: the in-silico
modelling of heart failure and of cancer.
Two Grand Challenge Research Questions
• What causes heart disease?
• How does a cancer form and grow?
• These two diseases together cause 61% of all
deaths in the UK
Courtesy of Peter Kohl (Physiology, Oxford)
Normal beating
Fibrillation
Multiscale modelling of the heart
MRI image of a
beating heart
Fibre orientation ensures
correct spread of
excitation
Contraction of
individual cells
Current flow through
ion channels
Simulation of sudden cardiac death due to a
mechanically induced impact applied during
repolarisation
Courtesy of: W.Li, P.Kohl,
and N.Trayanova.
J.
Mol. Hist. 2004 (in press)
Required 27 hours of CPU
time on an SGI IRIX 64
Mathematical model of a beating
heart by the Auckland Group
Multiscale modelling of cancer
An integrative approach to disease
modelling?
• The potential impact of this approach has been
demonstrated by the work on modelling the heart
• Time is ripe to extend to cancer: UK has extensive
expertise but little has yet been done
• Together the two application areas provide a sufficiently
hard e-Science problem to require a generic solution
• Methodology and infrastructure will be utilised across
biology and in other scientific domains
The scientific challenge
Modelling and coupling phenomena which occur
on many different length and time scales
• 1m
• 1mm
• 1mm
• 1nm
Range = 109
person
tissue morphology
cell function
pore diameter of a membrane protein
• 109 s (years)
• 107 s (months)
• 106 s (days)
• 103 s (hours)
•1s
• 1 ms
• 1 ms
Range = 1015
human lifetime
cancer development
protein turnover
digest food
heart beat
ion channel gating
Brownian motion
Details of test-run of heart simulation code
on HPCx
•
Modelled 2ms of electrophysiological excitation of a 5700mm3 volume of tissue from
the left ventricular free wall
•
Noble 98 cell model used
•
Mesh contained 20,886 bilinear elements (spatial resolution 0.6mm)
•
0.05ms timestep (40 timesteps in total)
•
Required 978s CPU on 8 processors and 2.5 Gbytes of memory
•
A complete simulation of the ventricular myocardium would require up to 30 times the
volume and at least 100 times the duration
•
Estimated max compute time to investigate arrhythmia ~107s (~100 days) requiring
~100Gb of memory (compute time scales to the power ~5/3)
•
At high efficiency this scales to approximately 1 day on HPCx
Key Deliverables
•
A robust and fault-tolerant infrastructure
to support post-genomic research in
integrative biology that is user and
application driven
•
2nd Generation Grid bringing together
components across range of current
EPSRC pilot projects
The e-Science Challenge
•
To leverage the global Grid infrastructure to build an
international “collaboratory” which places the
applications scientist “within” the Grid allowing fully
integrated and collaborative use of:
–
–
–
–
–
HPC resources (capacity and capability)
Computational steering, performance control and visualisation
Storage and data-mining of very large data sets
Easy incorporation of experimental data
User- and science-friendly access
=> Predictive in-silico models to guide experiment and, ultimately,
design of novel drugs and treatment regimes
e-Science/Grid Research Issues
• Ability to carry out reliably and resiliently large scale
distributed coupled HPC simulations
• Ability to co-schedule Grid resources based on a GGFagreed standard
• Use of Grid Services based on OGSA-DAI for data
virtualisation
• Secure data management and access-control in a Grid
environment
• Grid services for computational steering conforming to an
agreed GGF standard
e-Science/Grid Research (contd.)
• Grid Services for supporting distributed collaborative working
including steering and visualisation
• An interface to using Grid resources which understands and
supports effectively the science context of the project
• The project also stretches the cross-disciplinary aspects of
the Grid by linking medical, biological, engineering and
computing activities.
• The project is intending to produce a long term (~10 year)
production environment based on the Grid to support what
we expect to become a major scientific growth area.
Architecture and Software Engineering
•
Initially use Web Services to provide a platform and
language independent interface to the main functional
components
•
Adopt Grid Services as stable open source OGSAcompliant implementations become available
•
Deploy an object-oriented component-based toolkit
allowing a “plug-and-play” style programming
paradigm
•
Use of Portal Technologies to provide collaborative
access to services
Architecture
In silico whole organ modelling collaboratory
Multiple collaborating users
Plug-ins to
User’s
Desktop
Environment
Job
Management
Client
Computational
Steering
Client
Visualisation
Control
Client
Data
Finder
Client
Other
Clients
API
Collaboration support service
Front
End
Services
Back
End
Services
Job
Management
Service
Computational
Steering
Service
Data
Finder
Service
Modelling
& Simulation
Services
Visualisation
Control
Service
Other
Services
Data
Visualisation
Services
Data
Managemen
t
Services
Architecture
In silico whole organ modelling collaboratory
Multiple collaborating users
Plug-ins to
User’s
Desktop
Environment
Job
Management
Client
Computational
Steering
Client
Visualisation
Control
Client
Data
Finder
Client
Other
Clients
API
myGrid
RealityGrid
gViz
CCLRC
ICENI
gHWLM
Front
End
Services
Job
Management
Service
Back
End
Services
Collaboration support service
Computational
Steering
Service
e-Diamond
CCLRC
Data
Finder
Service
Modelling
& Simulation
Services
GEODIS
E
OGSA-DAI
CCLRC
BioSimGrid
Visualisation
Control
Service
CCLRC
gViz
Other
Services
Data
Visualisation
Services
Data
Managemen
t
Services
gViz
CCLRC
RealityGrid
Technology Gaps that will be addressed
Much of this work will be in conjunction with other EPSRC Pilot projects
• Resilient, robust, reliable Grid framework for large scale distributed
coupled simulations
• Standardised Grid framework for computational steering and
visualisation
• Metadata schemas for describing the information and data
resources involved
• Standardised means to schedule multiple resources on the Grid
concurrently
• Tools for collaborative working in a Grid Services environment
• Transparent Grid
Project management
• Building on extensive experience in other eScience projects (particularly e-DiaMoND)
• Focus on team building and common goals (key
for large, inter-institutional development
projects)
• Establishing good communication mechanisms
• Iterative prototype development
The Team
•
World-leading expertise in the two application areas
•
IBM
•
CCLRC
•
Seven UK and NZ Universities (Oxford, Nottingham, Leeds, UCL,
Birmingham, Sheffield and Auckland)
•
Expertise from across the UK e-Science Programme
•
Extensive existing connectivity between all members of the consortium and
with the wider research communities in e-Science and within the application
areas
•
Research training in an area crucial to the UK
The Resources
• £2.44M from EPSRC e-Science to fund 10 PDRAs and 6 PhD
students
• A further 4 PhD students plus sys admin and secretarial support
funded internally
• Equivalent of 3FTEs from IBM plus substantial hardware discounts
to provide a Power 4 server and high performance workstations to
all project staff.
• Use of Atlas Data store at RAL and substantial commitment of staff
time by CCLRC
• Large pool of expertise through the co-investigators in the seven
partner universities, IBM and CCLRC
• Extensive access to national HPC resources (HPCx and CSAR)
Current Status
• Award letter issued 26/9/03, agreed by University in late
October, grant announced 26/10/03.
• Project manager, project architect, six PDRAs, and one
D.Phil student already appointed
• Project Structure defined and agreed, requirements
gathering and security policy exercises commenced
• Recruitment of other staff in process
• Kick off meeting of project participants held in Oxford on
January 19th
Download