The Integrative Biology Grid – Building on e-Science Components

advertisement
The Integrative Biology Grid – Building on e-Science Components
Lakshmi Sastry*, Srikanth Nagella, Ronald Fowler, John Taylor, Richard Wong, Anjan Pakhira,
and Deniz Turan
Applications Group, e-Science Centre
Science and Technology Facilities Council
Rutherford Appleton laboratory
Chilton, Didcot, OX11 0QX
Abstract
The primary aim of the Integrative Biology (IB) project is to develop a second generation
”Hybrid Grid” to support post-genomic research in integrative biology. The requirements for the
grid and applications middleware were determined by considering the needs of two vitally
important clinical areas, cardiovascular disease and cancer. Component services of the IB grid
have been in use by the scientific users which provides the vital feedback necessary to complete
the final phase. This is a report on the work to date and a review of the experience of the
computer scientists, in particular, those addressing the interactive visualization and steering
requirements.
1. Introduction
Full understanding of biological function is
feasible only when biologists are able to
integrate all the available information to
recreate the non-linear, dynamic interaction
at multiple levels of the system. For
instance, the beating of a heart involves a
chain of processes starting at the regulation
of ion concentrations within the cardiac
cells, the correct transport of the ions via
cell membrane which leads to the
propagation of action potential to
rhythmically contract and expand the heart
muscle fibres. Advances in biotechnology,
underpinned by the massive leap of in
computational
resources
provide an
opportunity to recreate the biological
function through mathematical models. An
iterative approach between experiment and
modelling can lead to more accurate
determination of biological function with
predictive power, leading to novel drugs and
treatment.
The goal of the EPSRC funded Integrative
Biology (IB) project is to build a grid
framework to support physiologists,
clinicians, and computational biologists to
link experiment and modelling seamlessly to
direct biology research. The needs of those
undertaking research in cardiovascular
disease and cancer were sufficiently similar
and diverse at the same time to provide the
developers the necessary requirements to
build such a framework.
Science
agenda
addressed
through
exploitation of the Integrative Biology grid
are:
• the development of integrated whole
organ models of biological systems,
primarily in clinical science;
• using these models to begin to study the
development cycle of cardiac disease and
cancer tumours;
• bringing together clinical and laboratory
data from many sources to evaluate and
improve the accuracy of the models;
• understanding the fundamental causes of
these life-threatening conditions and how
to reduce their likelihood of occurrence;
• identification of opportunities for
intervention at the molecular and cellular
level using customised drugs and novel
treatment regimes.
The e-Science challenges for Integrative
Biology include:
• the provision of transparent, coscheduled
access
to
appropriate
combinations of distributed HPC and
database resources needed to run coupled
multi-scale whole organ simulations;
• the exploitation of these resources
efficiently through the application of
computational
steering,
workflow,
visualisation and other techniques
developed in earlier e-Science projects
wherever possible;
• enabling
of
globally
distributed
biomedical researchers to collaboratively
control, analyse and visualise simulation
results in order to progress the scientific
agenda of the project;
• maintenance of a secure environment for
the resources used and information
generated by the project without
inhibiting scientific collaboration.
The ambition is that the tools developed by
the project will improve the productivity of
clinical and physiological researchers in
academia and the pharmaceutical and
biotechnology sectors. The UK e-Science
community will benefit from access to new
tools developed by the project and from the
example of an integrated computational
framework that the project will develop.
Section 2 of this paper reports the user
requirements. Section 3 describes the
component parts of the IB grid that emerged
from the requirements and the basic
architecture of the IB grid framework and its
service layers. Section 4 describes a couple
of applications built around the IB grid are
described. In Conclusions the status of the
IB framework together with observations on
the developers experience is given.
2. User requirements
User requirements gathering is an expertise
where the requirements gatherer deploys any
one of a number of strategies as appropriate
for the user in the context of the purpose of
gathering such requirements. Within IB, the
users came in two highly distinctive
categories. The first were users who were
technology aware who had existing and
often deficient systems and were able to
articulate what they wanted from a system in
a language understood by those developing
these.
At the opposite end, there were
mathematicians and experimental biologists
who had no existing systems on which to
base their needs for improvements and were
not as aware of what technology can do for
them. However, they were often able to
quickly grasp some of the concepts and
more surprisingly able to articulate what
they do not want rather more precisely.
The requirements gathering were undertaken
through a variety of methods. One to one,
face to face meetings were the primary
method used. Existing applications are
shown
as
exemplars
to
describe
functionality during such meetings. In
addition, the developers also shadowed
individual users to understand their
everyday scientific activity be that to
compose a set of input parameter sweeping
configurations for simulations or preparing
in vitro experiments and collect image data.
A third methodology used was to distribute
questionnaire prior to face to face meetings
to elicit information on generic issues such
as where the data is stored, the back up and
security mechanisms used, the data formats,
the simulation details, who the data are
shared with, its source authorization and
authentication as well as whether any legal
and ethical issues involved in the handling
and sharing of the data. The users were also
asked to review the current limitations, what
they would like to have without limiting
their ideas to the feasibility of such desired
requirements [1]. All the information
gathered in these ways was then analysed to
arrive at a set of generic as well as specific
requirements. It is then further partitioned to
create a set of application specific
functionality that needs to be implemented
to keep the users’ enthusiasm and
commitment with the project and thereby
provide continuous feedback to the evolving
framework. This also provided the software
architects and developers time to design the
detailed services based architecture, the
layers of services and interfaces, the
application utilities that need to be supported
within the framework and toolkits that the
users are familiar with. It also provided the
opportunity to evaluate output from first
generation e-Science projects to decide
which modules could be adapted within the
IB framework.
The user requirements gathered can be
broadly categorized as follows:
• Secure access to high performance
compute
resources,
together
with
middleware infrastructure to submit
simulations. With some of the user
communities, especially the cancer
modelers, the techniques of high
performance
computing
and
even
knowledge of state of the art computational
techniques were not widely known.
• In the case of empirical tumour modelling
simulations, computational steering was
recognised as a potential method to aid
exploration of the parameter space of
models.
• Structured, secure data management
beyond individual user’s desktop was
uniformly and urgently required.
• Advanced higher and multi dimensional
Visualisation techniques were deemed
desirable.
• Visualisation requirements were made
more complex by the need to have the
techniques interfaced through proprietary
desktop problem solving environments and
application specific visualisation toolkits.
• To avoid downloading vast amounts of
experimental and simulation data to the
users desktop machines it is desirable to
perform analysis and visualisation tasks on
the grid, close to the data sources.
• Advanced interaction techniques such as
cutting plane in user selected orientation
were requested for application specific
visualisation toolkits.
The requirements exercise produced some
unexpected and interesting points. The first
of these was that for highly specialized
small community of researchers and
institutions, competition is intensive and
collaboration meant, especially in the heart
modelling topic, researchers knew each
others work but do not actively share models
or data. Another interesting finding was that
the end users were highly focused on their
science and publications and technologies
that were not perceived to provide
immediate benefits were quickly ignored,
even if their potential longer term benefits
were obvious. For instance, the effort
required to standardize metadata and to use
that to query and access data were relegated.
There was a strong emphasis placed on
getting hard coded non generic software
modules, especially for visualisation and
interaction, built for favourite tools which
were not available to the wider community
to help individual research groups.
Without exception, all the end user scientists
have to be supported through the process of
getting an account on the National Grid
Service (NGS), from the application process
to using the certificate to submit a job to the
NGS, despite developing step by step guide
for the purpose. Biologists sometimes faced
formidable problems to complete these
simple steps either because system and
network configuration set up in their
institution or their ability to understand the
process was limited by the new concepts and
grid terminology they had to negotiate. In
addition, scientists were and continue to be
impatient to allocate the time this process
takes. However, the experience of providing
one to one support, unearthed a wealth of
requirements not only to simplify the NGS
Certification Authority portal but also to
alter the content and the language in which
the process was expressed.
3. IB Grid architecture
A typical simplified user scenario is for a
computational biologist to create one or
more detailed input scripts with parameter
ranges, indicate a data file that contains the
details of the computational mesh and an
executable of the simulations to run these
with and to store the resulting output. She
may then use one or more proprietary
toolkits to visualise and analyse the data.
This basic scenario can be made more
complex with the need to monitor the
simulation as it executes or assimilate and/or
compare in vitro or clinical data to compare
and contrast. In addition, perhaps a cardiac
arrhythmia simulation may have been
enabled to use computational steering so that
a researcher may introduce a stimulant to
observe its impact on the development of
action potential and wishes to visualise the
re-entry pattern [2].
even local clusters. It supports the Grid
Security Infrastructure model based on
X509 certificates. Data storage and
management are handled through the
Storage Resource Broker (SRB) [3]. At the
user level, a variety of desktop interfaces
can be realised based on the requirements
user task for data analysis. Other
overarching services are accessed the IB
Interface (IBI). Figure 2 below provides an
overview of the detailed architecture of the
IB framework that supports the user
requirements.
The IB grid architecture is multi layered
service oriented architecture prevalent in
other application domains. Figure 1 below
gives an overview of the services layer.
Figure 1 Overview of IB Services
The front end services are based on Secure
Web Services with standard WSDL
descriptions. These are run on IB servers
distributed across the project partner
institutions. The services are interfaced
using SOAP messages from any IB client.
The Globus toolkit API is used for the grid
communication protocol between the front
and back end services. Open standards and
software stack are used throughout both to
build IB Services and for communication
protocols.
Figure 2: Overview of detailed architecture
Detailed architecture
The IB framework is built on UK National
grid Service but is designed to be adaptable
to other Grid services such as TeraGrid or
The architecture is designed with extension
capabilities to collaborative working when
the community evolves to make use of the
full extent of the framework. The
architecture is best described using a use
case scenario. User A invokes IBI on his
desktop, selects the simulation he wishes to
run and a visualisation service or toolkit that
can read her data, interpret and process it
and generate the image (an encoder). She
also indicates the input, the data files and
output directory to store the data. The
communication backend from the IBI first of
all initialises the server-side visualisation
toolkit and opens necessary ports to receive
connection and data from the simulation.
Part of this initialization phase is to realise
the control panel of the chosen visualisation
toolkit on the client side for user interaction
with the displayed results from the
simulation. A second task of the client-side
backend is to pass all the information to the
IB front-end service manager. The Services
Manager invokes the simulation with
necessary input data and informs it to stream
data to the visualisation toolkit. In this
architecture, the data from simulation or
SRB is processed on the server side and
images sent to IBI. The user interacts with
the displayed visualisation using the control
panel. Pick and selection actions are passed
to the visualisation toolkit which interprets
these correctly (decoder role). In this
architecture, the interactive visualisation
toolkit is deprecated to the role of an
encoder and decoder on the server-side.
The necessity to download large amounts of
data and write to and access from local disk
is thus avoided. The original data and any
intermediate data as specified by the user are
all stored in the SRB.
The IBI provides the user not only with an
interface to manipulate his SRB data store
but it also allows him to back up his data on
the Atlas Data Store [4] using the SRB
interface for long term archiving. The IBI
also provides an interactive interface to
manage the grid certificate using myProxy
server [5].
The job submission service allows the user
to submit jobs to NGS compute clusters as
well as the national high performance cluster
HPCx. The monitoring of submitted jobs
and other house-keeping metadata are
automatically generated and stored on an
Oracle database service attached to the
NGS. Hibernate libraries [6] and the STFC
metadata schema [7] are used to create and
manage the database house-keeping.
The gViz computational steering library [8]
is used in the IB framework to support
computational steering. If there is a user B,
then the embedded CoVisa module can be
used to provide collaboration. The steering
communications are handled using the built
in gViz communication calls.
It is possible to include another user C who
will have a restricted view only access to the
simulation session. This may be typically a
scenario where two or more participating
academic research partners may wish to
demonstrate the progress and discuss details
of the research with a commercial partner.
Image based Interactors: A unique novelty
of the IB infrastructure is that it includes an
Interactor library. Image based steering is
used to enhance the quality of human
computer
interaction
in
steering
environments [9]. The Interactor is an icon
representing the data type of a parameter
that the user may wish to steer. At set up
time, if the simulation is steering enabled,
the user can select the parameters to steer.
The initialisation process will create
instances of the appropriate interactors,
place them inside the graphics window,
binding the parameter to the interactor.
From thereon the user is able to directly
manipulate the interactor to convey the
parameter values for steering. The IB
interactors are OpenGL based visual objects
with encoded behaviours which are reusable
components within any OpenGL based
toolkit.
4 Applications
The IB Grid is used to interact with
applications in heart and tumour modelling.
Figures 3a and 3b below show a three
dimensional
carcinoma
model
with
embedded computational steering control to
study the effect of nutrient on cell growth
and concentration.
Figures 3a & 3b: In situ ductal carcinoma
simulation with steering
Figures 4a & 4B below show an image
processing application for vascular cancer
tumour where the image processing steps are
services within the grid to identify blood
vessels and tumour cells so that enumeration
and other statistical information can be
automated. The image in this case can be
displayed using one of two techniques. The
first called ‘identify’ shows the complete
structure of the blood vessels/ cancer cells or
both structures, either in their original colour
or highlighted.
The second display
technique called ‘edge’ shows the edges of
the blood vessels/ cancer cells or both.
These edges are the edges that will be used
to join this 2D image to the next to form the
3D geometry.
Figures 4a & 4b: Stage 3 visualization of
vascular tumour - the original and processed
images with edges.
5 Conclusion
The IBI and IB grid services are available to
users to integrate with their applications.
User trials are underway to make the system
robust. Building the Integrative Biology grid
results and software from previous
generation e-Science projects has been both
educating and challenging. Access to the
developers of such modules has proven to be
the single most significant factor in the
speed up and ease of use of such tools.
Working with scientists has also been a
challenging experience. They tended to be
highly focussed on their immediate needs
which were often in enabling their
applications to gain high performance,
parallelization, a friendly user interface or
advanced visualization and interaction
capability. Such needs had to be catered to
keep the users with the project to provide
feedback. However, it proved useful to have
done these tasks as the requirements formed
the basis of understanding and architecting
how the IB Grid needed to address the
requirements for interactivity, gain a handle
on data management issues and most
significantly how to design the server side
visualization utilities. The IB grid has
extensibility and flexibility built into it and
can form the basis for other projects.
Acknowledgement
The authors wish to acknowledge the
financial support of the EPSRC (ref no:
GR/S72023/01).
6 Reference
1. Lloyd, S., Gavaghan, D., Whiteley,
J., Pitt-Francis, J., Slaymaker, M.,
Boyd, D.R.S., Mac Randal, D.F.,
Kleese van Dam, K., Sastry, L.
Gathering Requirements for the
Integrative Biology project. eScience All Hands Meeting,
September 2004
2. Handley, J., Clayton, R., Wood, J.,
Holden, A.V., Brodlie, K.
Interaction with Cardiac Virtual
Tissues on the Grid: The gViz
Library. Accepted for publication in
Proceedings of FIMH'05, 2005
3. Storage Resource Broker –
http://www.sdsc.edu/srb/index.php/
Main_Page
4. http://www.escience.clrc.ac.uk/curation/
5. My Proxy
http://grid.ncsa.uiuc.edu/myproxy/li
cense.html
6. http://www.hibernate.org/
7. http://epubs.cclrc.ac.uk/workdetails?w=30324
8. http://www.comp.leeds.ac.uk/vvr/g
Viz/research_gViz_library.html
9. Sastry, L., and Wright, H. Image
based computational steering for
Integrative Biology, CompuSteer
Workshop, Hull,
http://compusteer.dcs.hull.ac.uk/IB.
pdf 2006.
Download