Using the Grid in the Mouse Atlas Project

advertisement
Using the Grid in the Mouse Atlas Project
Mehran Sharghi and Richard Baldock
MRC Human Genetics Unit, Crewe Road, Edinburgh, EH4 2XU, UK
[Mehran.Sharghi, Richard.Baldock]@hgu.mrc.ac.uk]
Abstract
This paper presents initial results of investigating and developing grid-enabled tools for
collaborative image processing, visualisation, and database access within the context of the mouse
atlas project at the MRC Human Genetics Unit. OGSI compliant grid services for compute
intensive task in the mouse atlas have been investigated and a grid service has been developed to
implement a grid-enabled deconvolution tool. We have developed EMAP portal to deliver Mouse
Atlas grid functionality and thereby minimise technical details for the end users. JSR-168
compliant portlets have been implemented to access our grid-enabled tools, viewing and
comparing experimental results using a visualisation grid service, and also collaboration with
mouse atlas users.
1. Introduction
This paper present initial results of investigating
and developing grid-enabled tools for
collaborative image processing, visualisation,
and database access within the context of the
mouse atlas project at the MRC Human
Genetics Unit. The Mouse Atlas Project
includes a spatio-temporal framework for the
developing mouse embryo [1], and a geneexpression database (EMAGE) [2] for spatially
mapped in situ data. In addition, software for
individual use for data analysis, mapping,
reconstruction, 3D visualisation, and query has
been
developed.
The
e-Science/Grid
infrastructure provides a technology to
coordinate distributed resources (compute, data,
etc.) that appear to the user as single virtual
computing system. This facilitates dynamic
formation and management of virtual
organizations and the building of large scale
and secure collaborative problem solving
environments. These aspects are making the
Grid a viable option in the mouse atlas project
which typically involves compute intensive
calculation and secure collaboration. Very often
the compute capability is required “at the lab-
bench” at a point of delivery that does not have
the expertise or resources to maintain and
deliver the service. This has been discussed in
an earlier paper [3] and is the primary goal of
this project – how can the Grid be usefully
deployed for biomedical research at the small
end? In this paper we present some initial
results of the implementation and deployment
of a special purpose deconvolution grid service
and delivering grid functionality using portal
technology.
2. Grid-enabled tools
Open Grid Services Architecture (OGSA) is a
standard platform for Grid Service oriented
architectures. It defines mechanisms to create,
name, manage, and discover services. Globus
toolkit version 3 (GT3) is a middle-ware for
building OGSA compliant grid-enabled tools,
services, and applications using a grid-service
programming model. In this exemplar we use
GT3 to deploy the service. This of course is
already being superceded and evolving to a
WSRF/GT4 deployment. This has caused
significant additional effort in order to build the
services but probably does not represent a
significant extra difficulty for the naive user. It
is this aspect of grid-use that this project is
intending to explore. Grid services for compute
intensive task in the mouse atlas have been
investigated and a grid service has been
developed to implement a grid-enabled
deconvolution tool.
2.2
Grid-enabled Deconvolution
Deconvolution is a technique to remove out of
focus blur in a set of images. An example is
Optical Sectioning in which 3D volumetric data
is obtained from the specimen using a
microscope. The microscope is focused at a
plane and a slice is recorded (see Figure 1).
Then it is refocused at a different plane and
another slice is recorded. This process is
continued until the entire specimen is covered.
Figure 1 - Optical sectioning
Problem with this method is that out of
focus structures appear blurred as a background
haze and obscure the in focus structures. Figure
2 shows this phenomenon which is appeared as
a bleeding artefact. Deconvolution is required
to remove the blur resulted from the out of
focus structures from the images.
There are commercial packages available
for deconvolution but in this case we need to
develop a novel solution for a new imaging
technique developed by Weninger and Mohun
[4]. This technique capture microscopy images
of the “block-face” of a serially sectioned
tissue. At high magnification there is blurring
because tissue deeper into the block is imaged
at the same time as the in-focus top face. It is
special because it is “one-way” blurring and
therefore is not handled by other systems. This
is not technically difficult to solve but is a good
example of where a computational solution is
required at the laboratory bench, distant from
the computing expertise and resources
providing the solution.
There are several deconvolution methods
including the following approaches:
• Nearest neighbour: data from out of focus
structures located in neighbouring planes are
used to deblurr each slice (fig 3).
• Inverse filtering: Simple inverse convolution
using the Fourier transform of the pointspread function.
• Iterative methods: iterative estimation of the
specimen using the point-spread function to
model the imaging process as a forward
convolution.
Almost all deconvolution methods require
an imaging model. In the case of a microscopy
the Point Spread Function (PSF) or Optical
Transfer Function (OTF) describes the
Microscope behaviour. PSF is the image of a
point source of light and OTF is the frequency
response of the microscope. PSF and OTF are
Fourier pairs. PSF can be calculated
analytically, by experiment, or automatically
from the images. The latest is the basis for a
deconvolution
approach
called
blind
deconvolution.
A grid service has been developed to
implement a grid-enabled deconvolution tool.
This tool includes four grid service types as
illustrated in Figure 4. Deconvolution service
factory is a persistent grid service responsible
for creating instances of deconvolution service.
Deconvolution service is a transient service that
is created by deconvolution service factory to
interact with a client to perform a deconvolution
experiment. Users can repeat the experiment
with different parameters and data by
interacting with the same deconvolution grid
service. Deconvolution service is destroyed
when user finishes the experiment. The
Figure 2 - Bleeding artefact
Original Images
Deblurred Images
Blur
Integrate
Blur
Figure 3 - Nearest neighbour deconvolution
deconvolution grid service uses Reliable File
Transfer (RFT) in order to securely transfer
data files from the client side to compute
service. RFT is part of the globus toolkit data
management. There is a persistent RFT factory
service and for each transfer a transient instance
of RFT service is created by this factory. Here
we use one instance of RFT service to transfer
user data and another instance to transfer the
results back to the user.
Deconvolution service uses a native C
program to process the transferred data and
create a deblurred data set. Deblurred data is
then transferred back to the client by using
another instance of RFT grid service.
Deconvolution grid service also subscribes for
notifications
from
RFT
service
and
Deconvolution program. These notifications
indicating data transfer and deconvolution
status are in turn delivered to the client side. We
have also developed a Java GUI program for
the client side to facilitate data file transfer,
repeating the experiment with modified
parameters, and monitoring
progress at
different stages.
3. Delivering grid functionality via
portal
With the current grid technology there are some
issues and difficulties for end users to directly
access the Grid. Grid middleware and tools are
constantly evolving making the usage and
maintenance of the grid more complex for end
users. Our initial implementation required a
complex and variable installation process which
would not have been useful for the “real” user
i.e. in this case a biomedical researcher.
Furthermore, end users should be shielded from
technical
details
of
installation
and
implementation of the Grid. Grid Portals
provide a gateway to access grid services and
resources by creating a familiar browser-based
user interface to their Grid. We have therefore
investigated the use of a grid portal, specifically
gridsphere portal framework [5].
Portal pages are created from the markup
contents generated by portlets and is returned to
the client where it is presented usually in an
HTML browser. Portlets are pluggable user
interface
components
that
provide
a
RFT
(Reliable File Transfer)
Deconvolution
Service Factory
Service Factory
Create Service Instance
RFT Service
RFT Service
(result transfer)
(result transfer)RFT Service
RFT Service
RFT(result transfer)
Service
RFT(result transfer)
Service
(result transfer)
(data transfer)
Create Service Instance
Deconvolution
Service
Deconvolution
Deconvolution
Service
Service
Figure 4 – Grid services in Grid-enabled deconvolution tool
presentation layer to information systems.
Portal standards facilitate sharing of portlet
applications among portal vendors. This ensures
interoperability
across
different
portal
frameworks. As a result of portal standards
portlet repositories are now emerging where
people and communities can contribute their
portlets and share their experiences. There are
currently two portal standards: JSR-168 and
WSRP. The Portlet Java Specification Request
JSR-168 is a widely accepted standard which
provides a portlet abstraction together with a
portlet API, WSRP, the Web Services for
Remote Portlets API defines a standard for
interactive Web services that plug and play with
portals.
3.1
EMAP Portal
In order to deliver Mouse Atlas grid
functionality we have developed EMAP portal
based on the Gridsphere portal framework. We
have implemented JSR-168 compliant portlets
to facilitate access to our grid-enabled tools
(e.g. deconvolution, reconstruction) and also
collaboration with Mouse Atlas users. Figure 5
is a snapshot of EMAP Portal to access a gridenabled deconvolution. There are two portlets
involved in this experiment. First portlet is to
transfer image data files across the grid. This
portlet uses an instance of RFT grid service to
transfer each group of files across the grid. The
second portlet provides access to a
deconvolution grid service. Users can
manipulate different parameters involved in a
deconvolution and repeat the experiment on the
same data set. We have also developed a
visualisation portlet which is used to view and
compare original data and experiment results
data. There are to instances of this portlet on the
EMAP portal page where users can view and
compare After the experiment and parameter
adjustment user can transfer the results back
from the grid. The Architecture of the whole
system is illustrated in Figure 6. EMAP portal
uses MyProxy [6] to handle users credentials.
MyProxy is a credential repository and
management system, where clients can store
credentials in a secure server for later use.
Figure 5 – Snapshot of EMAP protal to run a grid­enabled deconvolution experiment Myproxy
Server
Store Credentials
Web
Browser
Retrieve Credentials
Visualisation
Request & Parameters
Grid Service
Rendered View
Request & Parameters
Request & Parameters
EMAP Portal
Deconvolution
Grid Service
Notification
Deblurred Images
Blurred Images
Transfer Data
(gridftp)
RFT
(Reliable File Transfer)
Grid Service
Deblurred Images
Transfer Data
(gridftp)
Blurred Images
Figure 6 – System architecture to access grid­enabled tools via EMAP portal
4. Collaboration
The mouse atlas and gene-expression data bases
include both in-house generated data and data
from external contributors. Contributing data to
the mouse atlas project requires a secure
collaborative environment which can be
provided and delivered by Grid and Portal
technologies. These collaborative requirements
are exemplified by the following use-cases:
• Ontology development - One of the
components of the mouse atlas framework is
an ontology of anatomical development and
a mapping between the text ontology and
the geometric space of the model embryos.
For these community resources to survive
we need high-quality tools for the
community to use and contribute,
particularly to the ontology mappings.
GRID issues: secure shared data, tools for
version management and group discussion.
• Collaborative mapping - A multiple users
collaborative secure environment to discuss
mapping and analysis of data. Grid issues:
collaborative visualisation, display, complex
mapping mode, HPC, and deformation
modelling.
• Associate editors – Currently all editing of
the mouse atlas is in-house. A secure and
data tracked environment is required to
allow remote editorial functions to bring
other expertise. Grid issue: security, remote
editing tools, and data tracking.
•
Submission review - The mouse atlas
editorial group will check each submission
for completeness and quality. The editorial
procedure may include an interchange
between the submitter and editor. Grid issue:
a secure discussive environment including
image manipulation and mapping.
We are in the initial stages of developing of
collaboration functionality described above.
Our approach is to use the grid to provide
secure access to data resources and to use the
concepts of groups, communication, and shared
spaces in portal frameworks to facilitate
interactive collaboration.
5. Conclusions
The Mouse Atlas Projects typically involves
compute intensive calculation and secure
collaboration. These requirements can be
addressed by the emerging e-Science/Grid
infrastructure that provides a technology to
securely coordinates distributed resources
(compute, data, etc.) as single virtual computing
system. We have developed OGSI compliant
grid services to develop grid-enabled tools for
compute intensive image processing and
visualisation tasks in the context of the mouse
atlas project. Service oriented approach might
not be the best approach where legacy
application can be submitted to the Grid as jobs
however it provides a more flexible and
interactive environment in terms of service
discovery, invocation and notifications. The
outcome of this research including the
developed tools and collaborative environment
have a much wider applicability, particularly
for biomedical applications.
Despite the benefits of the grid, grid
middleware is still in its infancy. Globus toolkit
as the most popular grid middleware has been
changing constantly. During the past two years
which we have been using it there have been
three major versions of globus toolkit based on
different standards. Moreover, installation, poor
documentation, and reliability has been an issue
specially with GT3. One approach to overcome
these issues is to isolate users from the grid
middleware as much as possible. Portal
technology has been used for this purpose.
Portals provide a framework to design
presentation layer of web and grid based
application by means of reusable GUI based
components (i.e. portlets). Portal standards
facilitate sharing of portlets between different
frameworks. Personalisation and collaboration
tools provided by portal frameworks is an
advantage for portals to be used in collaborative
research projects and e-science community.
However, portal based applications bound by
browsers limitations. One problem using portals
to access the grid is monitoring the grid status
and handling notifications from grid services
which requires dynamic refreshment of portal
page. One solution is periodical refreshment of
the browser page which might not be desirable.
In order to access grid functionality apart from
browsers other resources (e.g. valid credentials,
gridftp server, myproxy client side) are required
to be presented or installed.
The collaborative functionality is still under
development which will be delivered via EMAP
portal. One challenge is to adapt a portal
framework that directly supports grid
functionality, be JSR-168 compliant, and
provides collaboration facilities. Gridsphere
provides a JSR-Compliant portal framework
with portlets to facilitate Grid access but the
collaboration tools are very basic and poorly
documented. OGCE [7] and Sakai [8] provides
some of these requirements however they are
promising to deliver all these functionalities in
near future.
Acknowledgements
This work has been founded by the MRC eScience programme.
References
[1] Richard Baldock, Chrisophe Dubreuil, Bill
Hill and Duncan Davidson, “The Edinburgh
Mouse Atlas: Basic Structure and Informatics”,
Bioinformatics Databases and Systems, Ed. S
Levotsky, Academic Press, 102-115, 1999.
[2] Richard Baldock, Duncan Davidson, “Gene
Expression Databases”, Genetics Databases, Ed
M Bishop, Academic Press, 247-268, 1999.
[3] Richard Baldock and Mehran Sharghi,
“Collaborative Mapping and Analysis Tools for
Biological Spatio-Temporal Databases”, In the
proceedings of UK e-Science All Hands
Meeting, 196-199, 2003.
[4] Weninger W.J. and Mohun T, “Phenotyping
transgenic embryos: a rapid 3-D screening
method based on episcopic fluorescence image
capturing”, Nature Genetics 30, 59 – 65, 2002.
[5] Gridsphere, http://www.gridsphere.org.
[6] J Novotny, V Welch,
MyProxy,
http://grid.ncsa.uiuc.edu/myproxy/.
[7] OGCE, Open Grid Computing Environment,
http://www.collab-ogce.org/nmi/index.jsp.
[8] Sakai, http://www.sakaiproject.org.
Download