Information Extraction from Medical Images: Developing an e-Science Application Based on the Globus Toolkit Thomas Hartkensy Kelvin K. Leungy Derek L. G. Hilly Joseph V. Hajnal Daniel Rueckertz Imaging Sciences Department, Imperial College Hammersmith Hospital Campus, London yDivision of Imaging Sciences, King’s College, London zDepartment of Computing, Imperial College, London Rolf A. Heckemann Yalin Zhengy Abstract Background and Objectives Health care, medical research and drug discovery rely increasingly on radiological images. Digital archives of such images are becoming commonplace, but currently lack interoperability. Grid technology could enable access to distributed radiological image archives. Image registration algorithms could then be applied to generate atlases - authoritative reference datasets that describe anatomical structures or pathological changes - from large cohorts. We have developed a prototype service based on registration programs that can be used remotely via the Globus Toolkit. To facilitate widespread testing of this emerging capability, we created a web-based workbench front-end that is geared towards atlas generation and makes the service accessible for clinicians and researchers. Images of the human body, acquired through a variety of available modalities, play an increasingly important role in health care, medical research and drug testing. Many decision processes based on these images, such as making a diagnosis, rely on a radiologist making a visual assessment to identify abnormal tissue or to monitor change. There are currently only a few special medical areas, such as radiotherapy and reconstructive surgery planning, where advanced post-processing and interactive image analysis are routinely employed. Another area where quantitative results are usually required is drug development, but this is generally achieved either by subjective scoring based on visual assessments, or by manual or semi-automatic segmentation of the images for isolating and measuring target structures, such as anatomical features or lesions. Digital archives are quickly becoming the preferred mode of image storage for hospitals and research institutions, with many of them holding terabytes of data. Currently these image re- positories tend to be isolated from one another, with poor interoperability between them. Enabling distributed access using Grid technology would make it possible to interact with these repositories and extract information that has previously been unobtainable. This has the potential to add significant value to the images for clinicians and researchers. guish from normal age-related changes. When faced with such cases, radiologists may employ printed atlases generated from a normal individual to compare the patient’s images with. This approach, however, does not fully solve the problem, since the atlas will not usually be matched to the patient’s age and condition and there may be problems in comparing anatomical slices that are not well matched for spatial location. In addition, it introduces the new difficulty of distinguishing between normal anatomical and pathological variation among individuals. The ideal atlas reference would be one that is matched to the patient’s age, gender, background and medical history, that is geometrically aligned with the patient’s own cranium, and that also represents normal variability in structures of interest. This can be achieved with an interactive registration application that accesses a repository of MRI images, enables the selection of subjects that match the patient by selectable criteria, and provides quantitative comparisons of brain structure shapes and sizes - a "dynamic brain atlas" [3]. To realize this added value, computationally expensive image processing algorithms are likely to be needed. An important example is image registration, which enables quantitative comparisons between images by determining the transformation required to match one to the other using varying numbers of degrees of freedom. The method can be used to quantify change over time in serial imaging studies, to fuse information from different modalities, or to fuse information from different subjects. When comparing subjects or groups of subjects, image registration can be used to generate atlases - authoritative reference data sets that describe human anatomy and provide statistical information about sizes of structures or normal variations. The goal of this work is to explore the possibilitAtlas generation is of particular potential be- ies and requirements for a Grid-based registration nefit when applied to diffuse brain diseases, service that might help with decision support in such as Alzheimer’s dementia. At present, ima- health care and clinical research. We developed ging serves as an adjunct in the management a basic service and created a prototype interface of patients with dementia. A particularly use- tool that enables non-technical users to submit reful method is serial magnetic resonance imaging gistration processes as required for dynamic atlas (MRI) for monitoring disease progress. MRI can generation. provide surrogate endpoint markers for assessing the efficacy of new dementia treatments in drug trials [1]. Image registration of serial MRI’s is a Material proven research tool that identifies patterns of disease progression [2]. So far, the cohorts that have We developed the prototype tool, termed the been studied have been small. Processing was IXI (Information Extraction from Images) Workcentralized and therefore time-intensive. It would bench, as an image registration service with a be desirable to study large cohorts, requiring web-based interface. It builds on the following seamless access to distributed data sources and technologies: massive parallel processing facilities. The Grid infrastructure promises to provide both, thereby enabling interactive analysis. Registration Software Making a first-time diagnosis of diffuse brain disease on MRI can be challenging, as pathological At the core is a suite of programs for registration changes are often subtle and difficult to distin- of multi-modality images using voxel-similarity measures based on mutual information developed by one of the authors (DR) and previously described in [4]. The output of the image registration process is a file specifying the spatial transformation that maps one image to another. This is called a DOF (degrees of freedom) file. Database Image Import Images in DICOM (Digital Imaging and Communications in Medicine) format are loaded from an accessible file system using conversion software and a standalone Perl script for importing metadata. The script also handles conversion of series of two-dimensional images into single three-dimensional datasets. A MySQL database is used for intermediate storage of images as well as objects arising from ana- User Interaction lysis processes, such as transformation descriptions (DOF files). Following log-in, the user is prompted to select a registration target from a list of currently available image datasets (page choosetrg, Fig. ). A Web Interface SQL-like search statement (limited for security reasons) can be entered to restrict the number The Cocoon XML publishing framework is used of entries displayed. In the next step, the list for extracting and presenting database informa- and SQL search options are shown again, with a tion, for collecting user input and for launching prompt to select source datasets (page choosesrc). Grid processes. Cocoon is a servlet that offers On submission of this selection, a bash script various ways of programming interaction with is invoked once for every pairing of target and data sources. We wrote user and database interac- sources (page gridsubmit). The output of each tion modules as Extensible Server Pages (XSP). script process is loaded back into the database as The Cocoon sitemap feature allows such modules a transformation-description object. (e.g. XSP’s) to be arranged to represent a workflow. Discussion Grid Toolkit We have implemented two versions of the IXI Workbench. One of them uses the Globus Toolkit 2.4 (GT2). Globus processes are invoked from bash shell scripts, which are called from the web interface using a server-side Java runtime.exec call. A second implementation is based on an alpha release of the 3.0 version of the Globus Toolkit (GT3). Here, a GRAM (Globus Resource Allocation Manager) RSL (Resource Specification Language) file was created from user input and pipelined for Grid-based execution. The GT2 version is more reliable, but we expect the GT3 version to become our main platform as GT3 matures. Medical images have provided a vast amount of critical information to doctors and researchers. Traditional approaches have relied on detailed study of individuals or small groups of subjects. The methods used are not easily scaled to deal with extracting the wealth of information buried in the rapidly expanding image repositories that are now becoming available. To achieve this scalability requires new tools and levels of integration and interoperability. The Grid has great potential to provide for these needs, but it may be some time until client installations become commonplace. To explore this potential, and to facilitate testing by appropriate users, we created an image registration service based on a scalable database structure, with a web-based front end Figure 1: Screen shot of page choosetrg ture enable us to provide a more flexible workbench service, which can be configured to suit image registration tasks other than atlas generation, e.g. intermodality image fusion and quantifying change in serial imaging studies. Specific develThe IXI Workbench consists of a database opment areas are: and a set of dynamically generated HTML forms providing search and update capabilities geared towards atlas generation. The design al- Automated image import. Currently, the lows simple extension of the database to hold import method has to be adapted depending on new types of objects, beyond the current im- the data source. Ideally, a remote source of image age and transformation-description objects, with data should supply descriptive metadata, enabling the interface adapting automatically or semi- import of images without any manual adaptation. automatically. The upcoming MIRC (Medical Imaging Resource that can be accessed with any HTML-compliant browser. Emerging Grid capabilities are thus accessible using familiar technology, such as ubiquitous internet enabled personal computers. Another feature of the Workbench is modularity. Center) standard may provide a solution [5]. This should help with extension of the workbench to keep pace with Grid developments and to cater Parallel processing. The Workbench is befor the expanding needs of the project. ing extended to allow submission of the analysis tasks to Condor clusters. Future Work Security. Although Cocoon provides a fair The flexible database design, the modular ap- level of security by running server-side processes proach and the Grid service model will in the fu- under a non-privileged user ID, and although the Workbench interface restricts the type of entries References that can be made in databases, a risk of abuse remains when providing any web application. To [1] Smith AD. Imaging the progression of reduce such risks, careful input validation is reAlzheimer pathology through the brain. Proc quired, and this will need to be taken into account Natl Acad Sci USA 2002 Apr 2;99(7):4135in the further development of the Workbench in4137 terface. The exposure of the Grid back end as such is minimal, since it is protected by the Grid [2] Scahill RI, Schott JM, Stevens JM, Rossor MN, Fox NC. Mapping the evolution of reSecurity Infrastructure. Also, the back end need gional atrophy in Alzheimer’s disease: Unonly be opened to the server running the Workbiased analysis of fluid-registered serial MRI. bench application, as long as no other client acProc Natl Acad Sci U S A. 2002 Apr 2; cess is required. 99(7):4703-4707 [3] Hill DLG, Hajnal JV, Rueckert D, Smith SM, Hartkens T, McLeish K. A Dynamic Brain Atlas. MICCAI 2002 Springer Lecture Notes Medical imaging research, clinical radiology and in Computer Science 2488: 532-539 drug discovery are set to benefit from Grid services that enable large-scale image processing [4] Rueckert D, Sonoda LI, Hayes C, Hill DLG, Leach MO, Hawkes DJ. Non-rigid registraand to access distributed image resources. We tion using free-form deformations: applicahave built a prototype image registration service tion to breast MR images. IEEE Trans. Medand a Workbench to make it accessible through ical Imaging 1999; 18(8):712-721 a web client. The IXI workbench is designed to take advantage of Grid functionality in provid[5] Siegel E, Channin D, Perry J, Carr C, Reiner ing image registration technology in the novel B. Medical Image Resource Center 2002: form of a freely accessible, but secure service An Update on the RSNA’s Medical Imthat can accommodate changing and growing deage Resource Center. J Digit Imaging. 2002 mands. Other applications are under development Mar;15(1):2-4. that link directly with image archive systems. Conclusions