Collaboratative Mapping and Analysis Tools for Biological Spatio-Temporal Databases Richard Baldock and Mehran Sharghi MRC Human Genetics Unit, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK [richard,msharghi]@hgu.mrc.ac.uk Abstract The purpose of this project is to deliver fast and collaborative image processing to the biology laboratory bench. This will be achieved in the context of spatial data mapping, reconstruction, analysis and database query by developing GRID-enabled tools based on the existing tools developed as part of the Mouse Atlas Program at the MRC Human Genetics Unit. The Mouse Atlas Program at the MRC Human Genetics Unit in Edinburgh has developed a spatio-temporal framework for mapping spatially organised data in the developing mouse embryo. The Mouse Atlas (EMAP) which includes anatomy ontology of the developing embryo and a gene-expression database for spatially mapped data (EMAGE) are part of this program. The program has also delivered a large body of software and user interfaces for individual use for data mapping, analysis, database query and 3D visualisation. The primary goal of this project is to extend these software tools to allow secure and efficient collaborative work at all levels using the GRID. The new developed tools will facilitate collaborative development of the underlying spatio-temporal framework. The spatial mapping of the data into the standard frame work or other images involves non-linear warp transformations which are generally compute intensive. The GRID will be used to provide high-performance computing and secure communication for mapping of the data. The reconstruction tools developed in the Mouse Atlas Project include automatic alignment, sectioning distortions removal, automatic matching of 2D sections onto a 3D template and user interfaces to manipulate grey-scale transformations and image properties. Keywords: GRID-enabled tools, Mouse Atlas, Spatio-temporal databases, Biological data analysis and visualisation. 1. Introduction E-Science and the GRID are new terms to describe an extension of the use of the Internet not in specific attribute but in scale. The key benefits of the GRID are transparency, security and access to large scale compute and database resources. This is achieved by using high-bandwidth networks to share resources and to access central provision of large scale systems for example the high-performance computing at Daresbury (HPCX). Most e-Science funded projects are aimed at developing the infrastructure or at large scale problems for which the advantages of the GRID are clear. In this project we are focusing on the potential small scale benefits. In particular the benefits for biomedical scientists “at the lab bench” and specifically in terms of the tools developed as part of the Mouse Atlas Programme at the MRC Human Genetics Unit [1]. The tools range from simple image processing for example 3D reconstruction of biological material through data mapping to collaborative working both on reconstruction and analysis of data. A particular application we will consider is the use of a secure GRID for collaborative discussion of gene-expression data submitted to the Mouse Atlas Gene-Expression database [2]. Some example of for the use of the GRID within the Mouse Atlas Program are illustrated in the following sections. 2. Biomedical Image Analysis Image analysis in the context of development is typically to describe the observed patterns in terms of the recognizable structural features or to make a number of pattern feature measurements. In the advent of standard atlases there is now more emphasis on spatial mapping of the data with the possibility of much more sophisticated analysis, comparison and modeling. Figure 1 illustrates typical image analysis tasks involved in the Edinburgh Mouse Atlas Project. Some aspects of this process have been studied as part of the Mouse Atlas Program and where the GRID could be important is outlined below. Figure 1 – Typical image analysis involved in Edinburgh Mouse Atlas Project. 3. 3D Reconstruction and Visualisation 3D reconstruction and visualisation is relatively routine in some laboratories but full reconstruction from high-resolution serial sections required complex and compute intensive calculation. With the development of Optical Projection Tomography (OPT) at the HGU [3] this requirement will become widespread as experiments demand the geometric fidelity of OPT with the high-resolution and histology of microtome sections. Figure 2 illustrates image capture, reconstruction and visualisation in the Mouse Atlas Project. GRID aspects include remote HPC service, transparent high-speed data transfer, and data security. Figure 2 – Image capture, 3D reconstruction and visualisation in Mouse Atlas Project. 4. Atlases and Ontologies The Edinburgh Mouse Atlas Project (EMAP) is an early example of a new generation of spatiotemporal frameworks for model organisms. The key components of the framework are greylevel voxel models of each developmental stage, an ontology of anatomical development and a mapping between the text ontology and the geometric space of the model embryos (see Figure 3). These demand a high level of expertise and significant development time. For these community resources to survive we need high-quality tools for the community to use and contribute, particularly to the ontology mappings but also for inter-, intra-, spatial, and temporal transformations. GRID aspects include collaborative image tools, complex transformation calculation, HPC and deformation modeling, and database interoperability. Figure 3 – EMAP – anatomy ontologylinked to the grey-level models via spatial domains. 5. DataMapping In situ gene-expression data comes in many forms, 2D wholemount or sections, 3D volumes, single/multi channel signal etc. A common requirement is to be able to spatially map the data onto the standard framework or onto other images. Typical mapping processes in the Mouse Atlas Project are illustrated in Figure 4. In general this is a complex non-linear warp transformation established by manual interaction or automated processing. In either case the calculation is compute intensive and if interactive warping is required access to HPC is necessary. GRID aspects include high performance computation and high-speed and secure data transfer. (a) (b) (c) (d) Figure 4 - Typical gene-expression data, a) wholemount preparation; b) section data; c) 3D OPT data; d) a screen shot of a 3D warping interface for voxel data. 6. Spatio-Temporal Databases Once the data is reconstructed and mapped it can then be submitted to the central database. The Mouse Atlas gene-expression database emage, is curated to the extent that an editorial group will check each submission for completeness and quality (see Figure 5). This process is made easier by validity checking within the submission interface but the key checks on the data interpretation and mapping can only be done by “expert-eye” and the editorial procedure may include an interchange between the submitter and editor. This requires a secure discussive environment including image manipulation and mapping. Query of the database does not need the discussive aspect but many users may desire data security and privacy. The key requirement for query is interoperability and query translation. For this type of data the translation may involve complex spatial and temporal transformations both of the query domain and the data return. GRID issues include interactive discussive and secure environment, interoperability, and data-mapping services. Figure 5 – Screen shots of the EMAP spatio-temporal framework and the EMAGE geneexpression database. 7. Discussion This project is in the initial exploratory phase, GRID services are being implemented for the provision of HPC to the lab bench for the purpose of 3D reconstruction and deconvolution. This will then lead to services for spatial mapping (guery translation). Interactive and disscusive environments will be considered later on the project. 8. Acknowledgement This work has been founded by the MRC e-Science programme. 9. References [1] Richard A Baldock, Chrisophe Dubreuil, Bill Hill and Duncan Davidson, The Edinburgh Mouse Atlas: Basic Structure and Informatics, Bioinformatics Databases and Systems, Ed. S Levotsky (Kluwer Academic Press, 1999), pp102-115. [2] Richard Baldock and Duncan Davidson, Gene Expression Databases, Genetics Databases, Ed. M. Bishop, (Academic Press, 1999), pp247-268. [3] James Sharpe, Ulf Ahlgren, Paul Perry, Bill Hill, Allyson Ross, Jacob Hecksher-Serensen, Richard Baldock, Duncan Davidson, Optical Projection Tomography as a Tool for 3D Microscopy and Gene Expression Studies., Science. 296 (2002) pp541-545.