Using the Grid in the Mouse Atlas Project Mehran Sharghi and Richard Baldock MRC Human Genetics Unit, Crewe Road, Edinburgh, EH4 2XU, UK [Mehran.Sharghi, Richard.Baldock]@hgu.mrc.ac.uk] Abstract This paper presents initial results of investigating and developing grid-enabled tools for collaborative image processing, visualisation, and database access within the context of the mouse atlas project at the MRC Human Genetics Unit. OGSI compliant grid services for compute intensive task in the mouse atlas have been investigated and a grid service has been developed to implement a grid-enabled deconvolution tool. We have developed EMAP portal to deliver Mouse Atlas grid functionality and thereby minimise technical details for the end users. JSR-168 compliant portlets have been implemented to access our grid-enabled tools, viewing and comparing experimental results using a visualisation grid service, and also collaboration with mouse atlas users. 1. Introduction This paper present initial results of investigating and developing grid-enabled tools for collaborative image processing, visualisation, and database access within the context of the mouse atlas project at the MRC Human Genetics Unit. The Mouse Atlas Project includes a spatio-temporal framework for the developing mouse embryo [1], and a geneexpression database (EMAGE) [2] for spatially mapped in situ data. In addition, software for individual use for data analysis, mapping, reconstruction, 3D visualisation, and query has been developed. The e-Science/Grid infrastructure provides a technology to coordinate distributed resources (compute, data, etc.) that appear to the user as single virtual computing system. This facilitates dynamic formation and management of virtual organizations and the building of large scale and secure collaborative problem solving environments. These aspects are making the Grid a viable option in the mouse atlas project which typically involves compute intensive calculation and secure collaboration. Very often the compute capability is required “at the lab- bench” at a point of delivery that does not have the expertise or resources to maintain and deliver the service. This has been discussed in an earlier paper [3] and is the primary goal of this project – how can the Grid be usefully deployed for biomedical research at the small end? In this paper we present some initial results of the implementation and deployment of a special purpose deconvolution grid service and delivering grid functionality using portal technology. 2. Grid-enabled tools Open Grid Services Architecture (OGSA) is a standard platform for Grid Service oriented architectures. It defines mechanisms to create, name, manage, and discover services. Globus toolkit version 3 (GT3) is a middle-ware for building OGSA compliant grid-enabled tools, services, and applications using a grid-service programming model. In this exemplar we use GT3 to deploy the service. This of course is already being superceded and evolving to a WSRF/GT4 deployment. This has caused significant additional effort in order to build the services but probably does not represent a significant extra difficulty for the naive user. It is this aspect of grid-use that this project is intending to explore. Grid services for compute intensive task in the mouse atlas have been investigated and a grid service has been developed to implement a grid-enabled deconvolution tool. 2.2 Grid-enabled Deconvolution Deconvolution is a technique to remove out of focus blur in a set of images. An example is Optical Sectioning in which 3D volumetric data is obtained from the specimen using a microscope. The microscope is focused at a plane and a slice is recorded (see Figure 1). Then it is refocused at a different plane and another slice is recorded. This process is continued until the entire specimen is covered. Figure 1 - Optical sectioning Problem with this method is that out of focus structures appear blurred as a background haze and obscure the in focus structures. Figure 2 shows this phenomenon which is appeared as a bleeding artefact. Deconvolution is required to remove the blur resulted from the out of focus structures from the images. There are commercial packages available for deconvolution but in this case we need to develop a novel solution for a new imaging technique developed by Weninger and Mohun [4]. This technique capture microscopy images of the “block-face” of a serially sectioned tissue. At high magnification there is blurring because tissue deeper into the block is imaged at the same time as the in-focus top face. It is special because it is “one-way” blurring and therefore is not handled by other systems. This is not technically difficult to solve but is a good example of where a computational solution is required at the laboratory bench, distant from the computing expertise and resources providing the solution. There are several deconvolution methods including the following approaches: • Nearest neighbour: data from out of focus structures located in neighbouring planes are used to deblurr each slice (fig 3). • Inverse filtering: Simple inverse convolution using the Fourier transform of the pointspread function. • Iterative methods: iterative estimation of the specimen using the point-spread function to model the imaging process as a forward convolution. Almost all deconvolution methods require an imaging model. In the case of a microscopy the Point Spread Function (PSF) or Optical Transfer Function (OTF) describes the Microscope behaviour. PSF is the image of a point source of light and OTF is the frequency response of the microscope. PSF and OTF are Fourier pairs. PSF can be calculated analytically, by experiment, or automatically from the images. The latest is the basis for a deconvolution approach called blind deconvolution. A grid service has been developed to implement a grid-enabled deconvolution tool. This tool includes four grid service types as illustrated in Figure 4. Deconvolution service factory is a persistent grid service responsible for creating instances of deconvolution service. Deconvolution service is a transient service that is created by deconvolution service factory to interact with a client to perform a deconvolution experiment. Users can repeat the experiment with different parameters and data by interacting with the same deconvolution grid service. Deconvolution service is destroyed when user finishes the experiment. The Figure 2 - Bleeding artefact Original Images Deblurred Images Blur Integrate Blur Figure 3 - Nearest neighbour deconvolution deconvolution grid service uses Reliable File Transfer (RFT) in order to securely transfer data files from the client side to compute service. RFT is part of the globus toolkit data management. There is a persistent RFT factory service and for each transfer a transient instance of RFT service is created by this factory. Here we use one instance of RFT service to transfer user data and another instance to transfer the results back to the user. Deconvolution service uses a native C program to process the transferred data and create a deblurred data set. Deblurred data is then transferred back to the client by using another instance of RFT grid service. Deconvolution grid service also subscribes for notifications from RFT service and Deconvolution program. These notifications indicating data transfer and deconvolution status are in turn delivered to the client side. We have also developed a Java GUI program for the client side to facilitate data file transfer, repeating the experiment with modified parameters, and monitoring progress at different stages. 3. Delivering grid functionality via portal With the current grid technology there are some issues and difficulties for end users to directly access the Grid. Grid middleware and tools are constantly evolving making the usage and maintenance of the grid more complex for end users. Our initial implementation required a complex and variable installation process which would not have been useful for the “real” user i.e. in this case a biomedical researcher. Furthermore, end users should be shielded from technical details of installation and implementation of the Grid. Grid Portals provide a gateway to access grid services and resources by creating a familiar browser-based user interface to their Grid. We have therefore investigated the use of a grid portal, specifically gridsphere portal framework [5]. Portal pages are created from the markup contents generated by portlets and is returned to the client where it is presented usually in an HTML browser. Portlets are pluggable user interface components that provide a RFT (Reliable File Transfer) Deconvolution Service Factory Service Factory Create Service Instance RFT Service RFT Service (result transfer) (result transfer)RFT Service RFT Service RFT(result transfer) Service RFT(result transfer) Service (result transfer) (data transfer) Create Service Instance Deconvolution Service Deconvolution Deconvolution Service Service Figure 4 – Grid services in Grid-enabled deconvolution tool presentation layer to information systems. Portal standards facilitate sharing of portlet applications among portal vendors. This ensures interoperability across different portal frameworks. As a result of portal standards portlet repositories are now emerging where people and communities can contribute their portlets and share their experiences. There are currently two portal standards: JSR-168 and WSRP. The Portlet Java Specification Request JSR-168 is a widely accepted standard which provides a portlet abstraction together with a portlet API, WSRP, the Web Services for Remote Portlets API defines a standard for interactive Web services that plug and play with portals. 3.1 EMAP Portal In order to deliver Mouse Atlas grid functionality we have developed EMAP portal based on the Gridsphere portal framework. We have implemented JSR-168 compliant portlets to facilitate access to our grid-enabled tools (e.g. deconvolution, reconstruction) and also collaboration with Mouse Atlas users. Figure 5 is a snapshot of EMAP Portal to access a gridenabled deconvolution. There are two portlets involved in this experiment. First portlet is to transfer image data files across the grid. This portlet uses an instance of RFT grid service to transfer each group of files across the grid. The second portlet provides access to a deconvolution grid service. Users can manipulate different parameters involved in a deconvolution and repeat the experiment on the same data set. We have also developed a visualisation portlet which is used to view and compare original data and experiment results data. There are to instances of this portlet on the EMAP portal page where users can view and compare After the experiment and parameter adjustment user can transfer the results back from the grid. The Architecture of the whole system is illustrated in Figure 6. EMAP portal uses MyProxy [6] to handle users credentials. MyProxy is a credential repository and management system, where clients can store credentials in a secure server for later use. Figure 5 – Snapshot of EMAP protal to run a grid­enabled deconvolution experiment Myproxy Server Store Credentials Web Browser Retrieve Credentials Visualisation Request & Parameters Grid Service Rendered View Request & Parameters Request & Parameters EMAP Portal Deconvolution Grid Service Notification Deblurred Images Blurred Images Transfer Data (gridftp) RFT (Reliable File Transfer) Grid Service Deblurred Images Transfer Data (gridftp) Blurred Images Figure 6 – System architecture to access grid­enabled tools via EMAP portal 4. Collaboration The mouse atlas and gene-expression data bases include both in-house generated data and data from external contributors. Contributing data to the mouse atlas project requires a secure collaborative environment which can be provided and delivered by Grid and Portal technologies. These collaborative requirements are exemplified by the following use-cases: • Ontology development - One of the components of the mouse atlas framework is an ontology of anatomical development and a mapping between the text ontology and the geometric space of the model embryos. For these community resources to survive we need high-quality tools for the community to use and contribute, particularly to the ontology mappings. GRID issues: secure shared data, tools for version management and group discussion. • Collaborative mapping - A multiple users collaborative secure environment to discuss mapping and analysis of data. Grid issues: collaborative visualisation, display, complex mapping mode, HPC, and deformation modelling. • Associate editors – Currently all editing of the mouse atlas is in-house. A secure and data tracked environment is required to allow remote editorial functions to bring other expertise. Grid issue: security, remote editing tools, and data tracking. • Submission review - The mouse atlas editorial group will check each submission for completeness and quality. The editorial procedure may include an interchange between the submitter and editor. Grid issue: a secure discussive environment including image manipulation and mapping. We are in the initial stages of developing of collaboration functionality described above. Our approach is to use the grid to provide secure access to data resources and to use the concepts of groups, communication, and shared spaces in portal frameworks to facilitate interactive collaboration. 5. Conclusions The Mouse Atlas Projects typically involves compute intensive calculation and secure collaboration. These requirements can be addressed by the emerging e-Science/Grid infrastructure that provides a technology to securely coordinates distributed resources (compute, data, etc.) as single virtual computing system. We have developed OGSI compliant grid services to develop grid-enabled tools for compute intensive image processing and visualisation tasks in the context of the mouse atlas project. Service oriented approach might not be the best approach where legacy application can be submitted to the Grid as jobs however it provides a more flexible and interactive environment in terms of service discovery, invocation and notifications. The outcome of this research including the developed tools and collaborative environment have a much wider applicability, particularly for biomedical applications. Despite the benefits of the grid, grid middleware is still in its infancy. Globus toolkit as the most popular grid middleware has been changing constantly. During the past two years which we have been using it there have been three major versions of globus toolkit based on different standards. Moreover, installation, poor documentation, and reliability has been an issue specially with GT3. One approach to overcome these issues is to isolate users from the grid middleware as much as possible. Portal technology has been used for this purpose. Portals provide a framework to design presentation layer of web and grid based application by means of reusable GUI based components (i.e. portlets). Portal standards facilitate sharing of portlets between different frameworks. Personalisation and collaboration tools provided by portal frameworks is an advantage for portals to be used in collaborative research projects and e-science community. However, portal based applications bound by browsers limitations. One problem using portals to access the grid is monitoring the grid status and handling notifications from grid services which requires dynamic refreshment of portal page. One solution is periodical refreshment of the browser page which might not be desirable. In order to access grid functionality apart from browsers other resources (e.g. valid credentials, gridftp server, myproxy client side) are required to be presented or installed. The collaborative functionality is still under development which will be delivered via EMAP portal. One challenge is to adapt a portal framework that directly supports grid functionality, be JSR-168 compliant, and provides collaboration facilities. Gridsphere provides a JSR-Compliant portal framework with portlets to facilitate Grid access but the collaboration tools are very basic and poorly documented. OGCE [7] and Sakai [8] provides some of these requirements however they are promising to deliver all these functionalities in near future. Acknowledgements This work has been founded by the MRC eScience programme. References [1] Richard Baldock, Chrisophe Dubreuil, Bill Hill and Duncan Davidson, “The Edinburgh Mouse Atlas: Basic Structure and Informatics”, Bioinformatics Databases and Systems, Ed. S Levotsky, Academic Press, 102-115, 1999. [2] Richard Baldock, Duncan Davidson, “Gene Expression Databases”, Genetics Databases, Ed M Bishop, Academic Press, 247-268, 1999. [3] Richard Baldock and Mehran Sharghi, “Collaborative Mapping and Analysis Tools for Biological Spatio-Temporal Databases”, In the proceedings of UK e-Science All Hands Meeting, 196-199, 2003. [4] Weninger W.J. and Mohun T, “Phenotyping transgenic embryos: a rapid 3-D screening method based on episcopic fluorescence image capturing”, Nature Genetics 30, 59 – 65, 2002. [5] Gridsphere, http://www.gridsphere.org. [6] J Novotny, V Welch, MyProxy, http://grid.ncsa.uiuc.edu/myproxy/. [7] OGCE, Open Grid Computing Environment, http://www.collab-ogce.org/nmi/index.jsp. [8] Sakai, http://www.sakaiproject.org.