Health-e-Child: A Grid Platform for European Paediatrics On behalf of the Health-e-Child Consortium (& with thanks to David Manset Maat-G) Richard McClatchey, UWE-Bristol UK CHEP07 3rd September 2007, Victoria, CA Contents • • • • • • Project Objectives & Challenges Health-e-Child Gateway The HeC Grid architecture Data Integration & Ontologies Futures Conclusions Motivation for the Project • Clinical demand for integration and exploitation of heterogeneous biomedical information • vertical dimension – multiple data sources • horizontal dimension – multiple sites • Need for generic and scalable platforms (Grid?) • integrate traditional and emerging sources – in vivo and in vitro • provide decision support • ubiquitous access to knowledge repositories in clinical routine • connect stakeholders in clinical research • Need for complex integrated disease models • build holistic views of the human body • early disease detection exploiting in vitro information • personalized diagnosis, therapy and follow-up 3 Objectives of Health-e-Child Build enabling tools & services that improve the quality of care and reduce cost with Real-time alert Integrated disease models Database-guided decision support systems Cross modality information fusion and data mining for knowledge discovery Individual Organ Tissue 4 Cell Molecule Imaging Lab Data Genomics Proteomics Demographics Physician Notes Life Style Healthy Child Guidance Guidance Enrich Augment Integrated Disease Modeling Vertical Data Integration Population Observation Process Sensors Establish multi-site, vertical, and longitudinal integration of data, information and knowledge Develop a GRID based platform, supported by robust search, optimisation and matching On-line learning Integrated Medical Database Introduction A Geographically Distributed Environment ASPER UCL GOSH UWE SIEMENS CERN Clinical Site NECKER IGG FGG EGF R&D Site UOA INRIA MAAT 6 LYNKEUS Introduction Applications Integration Challenge IGG Highlights NECKER - Different Networks: LANs, WANs, Internet - Security Constraints: Local & National Regulations - Bandwidth Limitations: LAN/WAN & Internet uplinks … GOSH 7 Contents • • • • • • 8 Project Objectives & Challenges Health-e-Child Gateway The HeC Grid architecture Data Integration & Ontologies Futures Conclusions HeC System Overview Heart Disease Applications Inflammatory Diseases Applications Brain Tumour Applications Common Client Applications user interface for authentication, viewing, editing, similarity search HeC Gateway HeC specific models and Grid services like query processing, security Grid Infrastructure databases, resource and user management, data security 9 Data Management & Integration The Health-e-Child Gateway • Our building blocks are Services • Our Gateway is a SOA • Existing blocks are available in the community • WSRF Containers: Globus Toolkit4, Tomcat, WSRF::Lite… • Data Access Layers: OGSA-DAI, AMGA… • File Transfer facilities: gridFTP… • Applies to other distributed systems • Our approach is to efficiently reuse and combine blocks to satisfy our requirements • Building blocks might be enhanced, i.e. HeC Authentication Service 10 Health-e-Child Technical Review Meeting, Paris, 17 January 2007 Gateway • The HeC Gateway • An intermediary access layer to decouple client applications from the complexity of the grid • Towards a platform independent implementation • To add domain specific functionality not available in Grid middleware Status √ SOA architecture and design √ implementation of privacy and security modules 11 Health-e-Child Technical Review Meeting, Paris, 17 January 2007 Our Approach Highlights - Simplicity, “all-in-one box” - Modularity & Scalability, “off-the-shelf” components - State-of-the-art Approaches One Key One Access Point To enterHospital X Per the system Institution User Workstation 12 Health-e-Child Gateway Technical Review Meeting, Paris, 17 January 2007 Platform & Grid Middleware HeC Scheduling Info Monitoring System Hosting Domain 1 Stable & Secure Environment EGEE gLite Health-e-Child HeC Gateway HeC DBMS Storage 200GB 50GB 1TB Security & Registration Data Unit One Access Point Job Managnt Computation Unit The Health-e-Child Access Point Hosting Domain 2 Stable & Secure Environment Virtualization 13 Health-e-Child Technical Review Meeting, Paris, 17 January 2007 The Platform The Health-e-Child Gateway (1) Client Applications One Access Point Functionality Access HeC Gateway Infrastructure Abstraction Computing Resources Inside the bo 14 Health-e-Child Technical Review Meeting, Paris, 17 January 2007 The Platform The Health-e-Child Gateway (2) One Access Point Secur ity HeC Gateway GT4 Inside the bo 15 Health-e-Child Technical Review Meeting, Paris, 17 January 2007 Platform & Grid Middleware The Health-e-Child Gateway (3) One Access Point HeC Gateway Grid GT4 Inside the bo 16 Health-e-Child Technical Review Meeting, Paris, 17 January 2007 The Platform The Health-e-Child Gateway (4) One Access Point Client Connectivity HeC Gateway GT4 Inside the box… 17 Health-e-Child Technical Review Meeting, Paris, 17 January 2007 The Platform The Health-e-Child Gateway (5) Client Applications One Access Point HeC Gateway Data Integration Inside the bo 18 Health-e-Child Technical Review Meeting, Paris, 17 January 2007 Contents • • • • • • 19 Project Objectives & Challenges Health-e-Child Gateway The HeC Grid architecture Data Integration & Ontologies Futures Conclusions Health-e-Child Technical Review Meeting, Paris, 17 January 2007 Grid • Grid technology (gLite 3.0) as the enabling infrastructure • A distributed platform for sharing storage and computing resources • HeC Specific Requirements • Need support for medical (DICOM) images • Need high responsiveness for use in clinical routine • Need to guarantee patient data privacy: access rights management storage of anonymized patient data only 20 Health-e-Child Status √ Testbed installation since Mai 2006 √ HeC Certificate Authority √ HeC Virtual Organisation √ Security Prototype (clients & services) √ Logging Portal & Appender Technical Review Meeting, Paris, 17 January 2007 Grid Platform Grid Architecture @ Month18 Virtualization Applied Being deployed 2nd Test Node Deployed 22 Health-e-Child Technical Review Meeting, Paris, 17 January 2007 Grid Platform Solution Facts & Conclusion - Test-bed Installed & Tested @ CERN - Validated the Middleware Architecture (V1.0) to be installed at the different Sites - However: Architecture still too centralized (VOMS, LFC…) - Being addressed in the second planning period CERN 23 Health-e-Child Technical Review Meeting, Paris, 17 January 2007 Contents • • • • • • 26 Project Objectives & Challenges Health-e-Child Gateway The HeC Grid architecture Data Integration & Ontologies Futures Conclusions Health-e-Child Technical Review Meeting, Paris, 17 January 2007 Vertical Integration • Vertical Levels • Split the domain into vertical levels • Data/Information from multiple levels • Implicit semantics connecting these levels. • Vertical Integration in applications • Data models should provide seamless access to all the information stored in HeC Population Individual Organ Tissue Cell Molecule • Vertically Integrated Modelling • Coherently integrate information from different levels (instances) • Provide integrated view of the data to the users (concept-oriented views, etc) • Queries against the integrated knowledge • Topics • Vertical Integration along the longitudinal axis in pediatrics • Semantics of relevance for presenting clinical data • Fragment extraction, alignment and integration from biomedical knowledge sources 27 Data and knowledge modelling Applications Similarity Query Application specific DSS HeC-universal Knowledge Representation ontologies Integrated Data Modelling Requirements Data Acquisition Protocols (WP9) 28 Users Requirements Specifications (WP2) Modelling with Domain Experts (WP6) Ontologies in Health-e-Child • Reuse existing medical knowledge to structure our feature space • • • • Working with established knowledge Linkage to “live” knowledge base(s) Saves effort on validating the models with domain experts Non-domain specific conceptualization (upper level ontologies): reusing these, we establish shareable/shared knowledge • Ontological modelling of paediatric medicine • Integration of information – resolving semantic heterogeneity • Different countries, hospitals, protocols • Research potential • OWL DL representation of paediatrics medical knowledge, identification of knowledge patterns • Decision-support systems • Similarity metrics and query enhancement. • Alignment of ontologies that overlap over the Health-e-Child domain 31 Contents • • • • • • 32 Project Objectives & Challenges Health-e-Child Gateway The HeC Grid architecture Data Integration & Ontologies Futures Conclusions Clinical and Application Roadmap Phase I (- 06/06) Phase II (07/06 - 06/07) Phase III (07/07 - 12/08) Phase IV (2009) Data acquisition, genetic tests, ground truth annotations Study Design and Approval Disease Model Development generic subtype specific patient and treatment specific User Requirements State of the Art Reports Knowledge Discovery Methods Classifiers Based on Genetics Feature Extraction from Imaging Segmentation/Registration 33 Integrated decision support Clinical Validation Refinement of Models and Algorithms Dissemination Grid Platform Future Work – Decentralized Architecture Working out a possible alternatives to decentralize key components of the grid middleware Hospital 3 1 Box 4 Domains - VOMS (M) PX (M) Hydra (D) AMGA (*S) - top BDII WMSLB LFC (*S) DPM MON ? Virtual Computer - site BDii - CE/LRMS or gsissh - WN - UI - FTS Virtual Computer Hospital 1 1 Box Autonomous Sites Hospital 2 1 Box 4 Domains - VOMS (S) PX (S) Hydra (D) AMGA (*M) - top BDII WMSLB LFC (*M) DPM MON ? Virtual Computer - site BDii - CE/LRMS or gsissh - UI - FTS Virtual Computer SMP 4 - n CPUs 34 SMP 2 - n CPUs 4 Domains - WN - VOMS (S) PX (S) Hydra (D) R-GMA ? AMGA (*S) - Virtual Computer top BDII WMSLB LFC (*S) DPM MON ? - site BDii - CE/LRMS or gsissh - LRMS - UI - FTS Virtual Computer SMP 4 - n CPUs - WN Ontology layer in the data management architecture Applications (e.g.DSS) GUI External Biomedical ontologies HeC-External Ontologies Mappings HeC Ontology Ontological Layer External public DBs Query Processing Engine Semantics Rules Ontology-Data Model Mappings Hospitals’ DBs 35 Contents • • • • • • 36 Project Objectives & Challenges Health-e-Child Gateway The HeC Grid architecture Data Integration & Ontologies Futures Conclusions HEP vs. BioMed - Comparison Vs. Number & location of sites, Job vs. Interactive processing, Data distribution & governance, Nature of applications, skills sets of users, Data Types & lifetimes, Data Replication Middleware-centric vs. Operating system-centric 37 Obstacles for Biomed Users • Grid Middleware is difficult to set up, configure, maintain and use. New skills needed. • Support for non-scientific applications is limited. • Hierarchical environments favoured, largely client-server, inflexible in nature - no support for P2P. • Grid software has concentrated on job-oriented (batch-like) computing rather than interactive computing. • Cannot share the cpu (CE) or storage capabilities (SE) with the Grid : all or nothing for the resourcing. • We need out-of-the-box Grid functionality for biomedical end-users. 38 Conclusions • HeC Gateway designed and implementation progressing through 2007-2008 • Grid platform decided – EGEE gLite • BUT Grid is not yet mature in non-HEP settings • Biomedical communities are very different to HEP communities. So no one-size-fits-all • Future must be towards user-participation in Grids – what about a Grid OS ? 39