HeC_CHEP - Indico

advertisement
Health-e-Child: A Grid Platform for
European Paediatrics
On behalf of the Health-e-Child Consortium
(& with thanks to David Manset Maat-G)
Richard McClatchey, UWE-Bristol UK
CHEP07 3rd September 2007, Victoria, CA
Contents
•
•
•
•
•
•
Project Objectives & Challenges
Health-e-Child Gateway
The HeC Grid architecture
Data Integration & Ontologies
Futures
Conclusions
Motivation for the Project
• Clinical demand for integration and exploitation of
heterogeneous biomedical information
• vertical dimension – multiple data sources
• horizontal dimension – multiple sites
• Need for generic and scalable platforms (Grid?)
• integrate traditional and emerging sources
– in vivo and in vitro
• provide decision support
• ubiquitous access to knowledge repositories in clinical routine
• connect stakeholders in clinical research
• Need for complex integrated disease models
• build holistic views of the human body
• early disease detection exploiting in vitro information
• personalized diagnosis, therapy and follow-up
3
Objectives of Health-e-Child
Build enabling tools & services that improve the
quality of care and reduce cost with
Real-time
alert
Integrated disease models
Database-guided decision support systems
Cross modality information fusion and data mining for
knowledge discovery
Individual
Organ
Tissue
4
Cell
Molecule
Imaging
Lab Data
Genomics
Proteomics
Demographics
Physician Notes
Life Style
Healthy
Child
Guidance
Guidance
Enrich
Augment
Integrated Disease Modeling
Vertical
Data Integration
Population
Observation Process
Sensors
Establish multi-site, vertical, and longitudinal
integration of data, information and knowledge
Develop a GRID based platform, supported by
robust search, optimisation and matching
On-line
learning
Integrated
Medical
Database
Introduction
A Geographically Distributed Environment
ASPER
UCL
GOSH
UWE
SIEMENS
CERN
Clinical Site
NECKER
IGG
FGG
EGF
R&D Site
UOA
INRIA
MAAT
6
LYNKEUS
Introduction
Applications Integration Challenge
IGG
Highlights
NECKER
- Different
Networks: LANs, WANs, Internet
- Security Constraints: Local & National Regulations
- Bandwidth Limitations: LAN/WAN & Internet uplinks …
GOSH
7
Contents
•
•
•
•
•
•
8
Project Objectives & Challenges
Health-e-Child Gateway
The HeC Grid architecture
Data Integration & Ontologies
Futures
Conclusions
HeC System Overview
Heart Disease
Applications
Inflammatory
Diseases
Applications
Brain Tumour
Applications
Common Client Applications
user interface for authentication, viewing, editing, similarity search
HeC Gateway
HeC specific models and Grid services like query processing, security
Grid Infrastructure
databases, resource and user management, data security
9
Data Management & Integration
The Health-e-Child Gateway
• Our building blocks are Services
• Our Gateway is a SOA
• Existing blocks are available in the
community
• WSRF Containers: Globus Toolkit4, Tomcat,
WSRF::Lite…
• Data Access Layers: OGSA-DAI, AMGA…
• File Transfer facilities: gridFTP…
• Applies to other distributed systems
• Our approach is to efficiently reuse and combine
blocks to satisfy our requirements
• Building blocks might be enhanced, i.e. HeC
Authentication Service
10
Health-e-Child
Technical Review Meeting, Paris, 17 January 2007
Gateway
• The HeC Gateway
• An intermediary access layer to decouple client applications from the
complexity of the grid
• Towards a platform independent implementation
• To add domain specific functionality not available in Grid middleware
Status
√ SOA architecture
and design
√ implementation of
privacy and security
modules
11
Health-e-Child
Technical Review Meeting, Paris, 17 January 2007
Our Approach
Highlights
- Simplicity, “all-in-one box”
- Modularity & Scalability, “off-the-shelf” components
- State-of-the-art Approaches
One Key
One Access
Point
To enterHospital X Per
the system
Institution
User
Workstation
12
Health-e-Child
Gateway
Technical Review Meeting, Paris, 17 January 2007
Platform & Grid Middleware
HeC
Scheduling
Info
Monitoring
System
Hosting Domain 1 Stable & Secure
Environment
EGEE gLite
Health-e-Child
HeC Gateway
HeC
DBMS
Storage
200GB
50GB
1TB
Security
&
Registration
Data Unit
One Access
Point
Job
Managnt
Computation Unit
The Health-e-Child Access Point
Hosting Domain 2 Stable & Secure
Environment
Virtualization
13
Health-e-Child
Technical Review Meeting, Paris, 17 January 2007
The Platform
The Health-e-Child Gateway (1)
Client Applications
One Access
Point
Functionality Access
HeC Gateway
Infrastructure Abstraction
Computing Resources
Inside the bo
14
Health-e-Child
Technical Review Meeting, Paris, 17 January 2007
The Platform
The Health-e-Child Gateway (2)
One Access
Point
Secur
ity
HeC Gateway
GT4
Inside the bo
15
Health-e-Child
Technical Review Meeting, Paris, 17 January 2007
Platform & Grid Middleware
The Health-e-Child Gateway (3)
One Access
Point
HeC Gateway
Grid
GT4
Inside the bo
16
Health-e-Child
Technical Review Meeting, Paris, 17 January 2007
The Platform
The Health-e-Child Gateway (4)
One Access
Point
Client Connectivity
HeC Gateway
GT4
Inside the box…
17
Health-e-Child
Technical Review Meeting, Paris, 17 January 2007
The Platform
The Health-e-Child Gateway (5)
Client Applications
One Access
Point
HeC Gateway
Data
Integration
Inside the bo
18
Health-e-Child
Technical Review Meeting, Paris, 17 January 2007
Contents
•
•
•
•
•
•
19
Project Objectives & Challenges
Health-e-Child Gateway
The HeC Grid architecture
Data Integration & Ontologies
Futures
Conclusions
Health-e-Child
Technical Review Meeting, Paris, 17 January 2007
Grid
• Grid technology (gLite 3.0) as
the enabling infrastructure
• A distributed platform for
sharing storage and computing
resources
• HeC Specific Requirements
• Need support for medical
(DICOM) images
• Need high responsiveness for
use in clinical routine
• Need to guarantee patient data
privacy:
 access rights management
 storage of anonymized patient
data only
20
Health-e-Child
Status
√ Testbed installation since Mai
2006
√ HeC Certificate Authority
√ HeC Virtual Organisation
√ Security Prototype (clients &
services)
√ Logging Portal & Appender
Technical Review Meeting, Paris, 17 January 2007
Grid Platform
Grid Architecture @ Month18
Virtualization Applied
Being deployed
2nd Test Node Deployed
22
Health-e-Child
Technical Review Meeting, Paris, 17 January 2007
Grid Platform
Solution Facts & Conclusion
- Test-bed Installed & Tested @ CERN
- Validated the Middleware Architecture (V1.0) to be installed
at the different Sites
- However: Architecture still too centralized (VOMS, LFC…)
- Being addressed in the second planning period
CERN
23
Health-e-Child
Technical Review Meeting, Paris, 17 January 2007
Contents
•
•
•
•
•
•
26
Project Objectives & Challenges
Health-e-Child Gateway
The HeC Grid architecture
Data Integration & Ontologies
Futures
Conclusions
Health-e-Child
Technical Review Meeting, Paris, 17 January 2007
Vertical Integration
• Vertical Levels
• Split the domain into vertical levels
• Data/Information from multiple levels
• Implicit semantics connecting these
levels.
• Vertical Integration in applications
• Data models should provide
seamless access to all the information stored in HeC
Population
Individual
Organ
Tissue
Cell
Molecule
• Vertically Integrated Modelling
• Coherently integrate information
from different levels (instances)
• Provide integrated view of the data to the users (concept-oriented views, etc)
• Queries against the integrated knowledge
• Topics
• Vertical Integration along the longitudinal axis in pediatrics
• Semantics of relevance for presenting clinical data
• Fragment extraction, alignment and integration from biomedical knowledge
sources
27
Data and knowledge modelling
Applications
Similarity
Query
Application
specific
DSS
HeC-universal
Knowledge
Representation ontologies
Integrated Data
Modelling
Requirements
Data Acquisition
Protocols (WP9)
28
Users Requirements
Specifications (WP2)
Modelling with Domain
Experts (WP6)
Ontologies in Health-e-Child
• Reuse existing medical knowledge to structure our feature space
•
•
•
•
Working with established knowledge
Linkage to “live” knowledge base(s)
Saves effort on validating the models with domain experts
Non-domain specific conceptualization (upper level ontologies): reusing
these, we establish shareable/shared knowledge
• Ontological modelling of paediatric medicine
• Integration of information – resolving semantic heterogeneity
• Different countries, hospitals, protocols
• Research potential
• OWL DL representation of paediatrics medical knowledge, identification of
knowledge patterns
• Decision-support systems
• Similarity metrics and query enhancement.
• Alignment of ontologies that overlap over the Health-e-Child domain
31
Contents
•
•
•
•
•
•
32
Project Objectives & Challenges
Health-e-Child Gateway
The HeC Grid architecture
Data Integration & Ontologies
Futures
Conclusions
Clinical and Application Roadmap
Phase I
(- 06/06)
Phase II
(07/06 - 06/07)
Phase III
(07/07 - 12/08)
Phase IV
(2009)
Data acquisition, genetic tests, ground truth annotations
Study Design
and Approval
Disease Model Development
generic  subtype specific  patient and treatment specific
User
Requirements
State of
the Art
Reports
Knowledge Discovery Methods
Classifiers Based on Genetics
Feature Extraction from Imaging
Segmentation/Registration
33
Integrated
decision support
Clinical
Validation
Refinement
of Models and
Algorithms
Dissemination
Grid Platform
Future Work – Decentralized Architecture
Working out a possible
alternatives to
decentralize key components of
the
grid middleware
Hospital 3
1 Box
4 Domains
-
VOMS (M)
PX (M)
Hydra (D)
AMGA (*S)
-
top BDII
WMSLB
LFC (*S)
DPM
MON ?
Virtual Computer
- site BDii
- CE/LRMS or
gsissh
- WN
- UI
- FTS
Virtual Computer
Hospital 1
1 Box
 Autonomous Sites
Hospital 2
1 Box
4 Domains
-
VOMS (S)
PX (S)
Hydra (D)
AMGA (*M)
-
top BDII
WMSLB
LFC (*M)
DPM
MON ?
Virtual Computer
- site BDii
- CE/LRMS
or gsissh
- UI
- FTS
Virtual Computer
SMP 4 - n CPUs
34
SMP 2 - n CPUs
4 Domains
- WN
-
VOMS (S)
PX (S)
Hydra (D)
R-GMA ?
AMGA (*S)
-
Virtual Computer
top BDII
WMSLB
LFC (*S)
DPM
MON ?
- site BDii
- CE/LRMS
or gsissh
- LRMS
- UI
- FTS
Virtual Computer
SMP 4 - n CPUs
- WN
Ontology layer in the data management
architecture
Applications
(e.g.DSS)
GUI
External
Biomedical
ontologies
HeC-External Ontologies Mappings
HeC Ontology
Ontological Layer
External
public DBs
Query Processing Engine
Semantics
Rules
Ontology-Data Model Mappings
Hospitals’ DBs
35
Contents
•
•
•
•
•
•
36
Project Objectives & Challenges
Health-e-Child Gateway
The HeC Grid architecture
Data Integration & Ontologies
Futures
Conclusions
HEP vs. BioMed - Comparison
Vs.
Number & location of sites, Job vs. Interactive processing, Data distribution
& governance, Nature of applications, skills sets of users, Data Types &
lifetimes, Data Replication
Middleware-centric vs. Operating system-centric
37
Obstacles for Biomed Users
• Grid Middleware is difficult to set up, configure, maintain
and use. New skills needed.
• Support for non-scientific applications is limited.
• Hierarchical environments favoured, largely client-server,
inflexible in nature - no support for P2P.
• Grid software has concentrated on job-oriented (batch-like)
computing rather than interactive computing.
• Cannot share the cpu (CE) or storage capabilities (SE) with
the Grid : all or nothing for the resourcing.
•  We need out-of-the-box Grid functionality for biomedical
end-users.
38
Conclusions
• HeC Gateway designed and implementation
progressing through 2007-2008
• Grid platform decided – EGEE gLite
• BUT Grid is not yet mature in non-HEP settings
• Biomedical communities are very different to HEP
communities. So no one-size-fits-all
• Future must be towards user-participation in Grids
– what about a Grid OS ?
39
Download