Collaboratative Mapping and Analysis Tools for Biological Spatio-Temporal Databases

advertisement
Collaboratative Mapping and Analysis Tools for
Biological Spatio-Temporal Databases
Richard Baldock and Mehran Sharghi
MRC Human Genetics Unit, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK
[richard,msharghi]@hgu.mrc.ac.uk
Abstract
The purpose of this project is to deliver fast and collaborative image processing to the biology
laboratory bench. This will be achieved in the context of spatial data mapping, reconstruction,
analysis and database query by developing GRID-enabled tools based on the existing tools
developed as part of the Mouse Atlas Program at the MRC Human Genetics Unit.
The Mouse Atlas Program at the MRC Human Genetics Unit in Edinburgh has developed a
spatio-temporal framework for mapping spatially organised data in the developing mouse
embryo. The Mouse Atlas (EMAP) which includes anatomy ontology of the developing embryo
and a gene-expression database for spatially mapped data (EMAGE) are part of this program.
The program has also delivered a large body of software and user interfaces for individual use
for data mapping, analysis, database query and 3D visualisation. The primary goal of this project
is to extend these software tools to allow secure and efficient collaborative work at all levels
using the GRID. The new developed tools will facilitate collaborative development of the
underlying spatio-temporal framework. The spatial mapping of the data into the standard frame
work or other images involves non-linear warp transformations which are generally compute
intensive. The GRID will be used to provide high-performance computing and secure
communication for mapping of the data. The reconstruction tools developed in the Mouse Atlas
Project include automatic alignment, sectioning distortions removal, automatic matching of 2D
sections onto a 3D template and user interfaces to manipulate grey-scale transformations and
image properties.
Keywords: GRID-enabled tools, Mouse Atlas, Spatio-temporal databases, Biological data
analysis and visualisation.
1. Introduction
E-Science and the GRID are new terms to describe an extension of the use of the Internet not in
specific attribute but in scale. The key benefits of the GRID are transparency, security and
access to large scale compute and database resources. This is achieved by using high-bandwidth
networks to share resources and to access central provision of large scale systems for example
the high-performance computing at Daresbury (HPCX). Most e-Science funded projects are
aimed at developing the infrastructure or at large scale problems for which the advantages of
the GRID are clear. In this project we are focusing on the potential small scale benefits. In
particular the benefits for biomedical scientists “at the lab bench” and specifically in terms of
the tools developed as part of the Mouse Atlas Programme at the MRC Human Genetics Unit
[1]. The tools range from simple image processing for example 3D reconstruction of biological
material through data mapping to collaborative working both on reconstruction and analysis of
data. A particular application we will consider is the use of a secure GRID for collaborative
discussion of gene-expression data submitted to the Mouse Atlas Gene-Expression database
[2]. Some example of for the use of the GRID within the Mouse Atlas Program are illustrated in
the following sections.
2. Biomedical Image Analysis
Image analysis in the context of development is typically to describe the observed patterns in
terms of the recognizable structural features or to make a number of pattern feature
measurements. In the advent of standard atlases there is now more emphasis on spatial mapping
of the data with the possibility of much more sophisticated analysis, comparison and modeling.
Figure 1 illustrates typical image analysis tasks involved in the Edinburgh Mouse Atlas Project.
Some aspects of this process have been studied as part of the Mouse Atlas Program and where
the GRID could be important is outlined below.
Figure 1 – Typical image analysis involved in Edinburgh Mouse Atlas Project.
3. 3D Reconstruction and Visualisation
3D reconstruction and visualisation is relatively routine in some laboratories but full
reconstruction from high-resolution serial sections required complex and compute intensive
calculation. With the development of Optical Projection Tomography (OPT) at the HGU [3]
this requirement will become widespread as experiments demand the geometric fidelity of OPT
with the high-resolution and histology of microtome sections. Figure 2 illustrates image
capture, reconstruction and visualisation in the Mouse Atlas Project. GRID aspects include
remote HPC service, transparent high-speed data transfer, and data security.
Figure 2 – Image capture, 3D reconstruction and visualisation in Mouse Atlas Project.
4. Atlases and Ontologies
The Edinburgh Mouse Atlas Project (EMAP) is an early example of a new generation of spatiotemporal frameworks for model organisms. The key components of the framework are greylevel voxel models of each developmental stage, an ontology of anatomical development and a
mapping between the text ontology and the geometric space of the model embryos (see Figure
3). These demand a high level of expertise and significant development time. For these
community resources to survive we need high-quality tools for the community to use and
contribute, particularly to the ontology mappings but also for inter-, intra-, spatial, and temporal
transformations. GRID aspects include collaborative image tools, complex transformation
calculation, HPC and deformation modeling, and database interoperability.
Figure 3 – EMAP – anatomy ontologylinked to the grey-level models via spatial domains.
5. DataMapping
In situ gene-expression data comes in many forms, 2D wholemount or sections, 3D volumes,
single/multi channel signal etc. A common requirement is to be able to spatially map the data
onto the standard framework or onto other images. Typical mapping processes in the Mouse
Atlas Project are illustrated in Figure 4. In general this is a complex non-linear warp
transformation established by manual interaction or automated processing. In either case the
calculation is compute intensive and if interactive warping is required access to HPC is
necessary. GRID aspects include high performance computation and high-speed and secure
data transfer.
(a)
(b)
(c)
(d)
Figure 4 - Typical gene-expression data, a) wholemount preparation; b) section data; c) 3D OPT
data; d) a screen shot of a 3D warping interface for voxel data.
6. Spatio-Temporal Databases
Once the data is reconstructed and mapped it can then be submitted to the central database. The
Mouse Atlas gene-expression database emage, is curated to the extent that an editorial group
will check each submission for completeness and quality (see Figure 5). This process is made
easier by validity checking within the submission interface but the key checks on the data
interpretation and mapping can only be done by “expert-eye” and the editorial procedure may
include an interchange between the submitter and editor. This requires a secure discussive
environment including image manipulation and mapping. Query of the database does not need
the discussive aspect but many users may desire data security and privacy. The key requirement
for query is interoperability and query translation. For this type of data the translation may
involve complex spatial and temporal transformations both of the query domain and the data
return. GRID issues include interactive discussive and secure environment, interoperability, and
data-mapping services.
Figure 5 – Screen shots of the EMAP spatio-temporal framework and the EMAGE geneexpression database.
7. Discussion
This project is in the initial exploratory phase, GRID services are being implemented for the
provision of HPC to the lab bench for the purpose of 3D reconstruction and deconvolution. This
will then lead to services for spatial mapping (guery translation). Interactive and disscusive
environments will be considered later on the project.
8. Acknowledgement
This work has been founded by the MRC e-Science programme.
9. References
[1] Richard A Baldock, Chrisophe Dubreuil, Bill Hill and Duncan Davidson, The Edinburgh Mouse Atlas:
Basic Structure and Informatics, Bioinformatics Databases and Systems, Ed. S Levotsky (Kluwer
Academic Press, 1999), pp102-115.
[2] Richard Baldock and Duncan Davidson, Gene Expression Databases, Genetics Databases, Ed. M.
Bishop, (Academic Press, 1999), pp247-268.
[3] James Sharpe, Ulf Ahlgren, Paul Perry, Bill Hill, Allyson Ross, Jacob Hecksher-Serensen, Richard
Baldock, Duncan Davidson, Optical Projection Tomography as a Tool for 3D Microscopy and Gene
Expression Studies., Science. 296 (2002) pp541-545.
Download