® IBM Software Group The Use of OGSA-DAI with DB2 Content Manager in the eDiaMoND Project M Oevers, B Collins, A Knox, J Williams IBM Software Group Overview eDiaMoND the project Strategies for Virtualisation How DB2 and CM are used OGSA-DAI enablement of CM Lessons Learnt IBM Software Group eDiamond – Project Announcement “One of the pilot e-science projects is to develop a digital mammography archive, together with an intelligent medical decision support system for breast cancer diagnosis and treatment. An individual hospital will not have supercomputing facilities, but through the grid it could buy the time it needs. So the surgeon in the operating theatre will be able to pull up a high-resolution mammogram to identify exactly where the tumour can be found” – Tony Blair (speech to the royal society – 23 may 2002) IBM Software Group eDiaMoND Partners IBM Software Group eDiaMoND – Project Deliverables Phase 0 Prototype (end-2003) eDiaMoND BluePrint Phase 1 Prototype (mid-2004) ? (Next Phase) • Grid Infrastructure • Grid-connected Workstation • Database for Storage & Retrieval of Images & Metadata • Computation for CADe, CADi and Statistical Analyses • Required Hardware, Software & Network for given Service Levels Breast Screening Programmes IBM Software Group eDiaMoND Functional Model IBM Software Group Strategies for Virtualisation Use II & II4C Expose through OGSA-DAI Investigate DQP IBM Software Group Virtualisation – things to remember Each Breast Care Unit (BCU) to operate independently from others Individual organisations coming together to for a Virtual Organisation Data loaded locally in each BCU Data is “owned” by the BCU Enable read access across all BCUs seamlessly Replication or Federation DB2 II & II4C Remember it’s got to be a Grid (eScience project) OGSA-DAI Distributed Query Processing (QDP) over OGSA-DAI IBM Software Group How OGSA-DAI is used with DB2 and CM DB2 stores the non-image data in a structured form DICOM describes an ER model Patient – Study – Series – Image Flexible to allow for multiple modalities Allow flexibility of data modelling/access control/query rewrite CM is used to store and manage the (large 30MB) DICOM files Files contain both non-image data and image data Identified by DICOM SOP Instance UID Flat CM data model (Customer Requirement) Both exposed as OGSA-DAI services DICOM – Digital Imaging and Communications in Medicine IBM Software Group Screening Administration Client Viewer Client Workflow 2 Client Layer Grid Layer 1 3 4 1. Query Query Service Retrieve Service Persistent Persistent 2. Worklist Create 3. Worklist Consume OGSA-DAI Service 4. Retrieve Persistent OGSA-DAI Service Persistent Worklist Service Transient Grid Layer Data Layer DB2 Instance Content Manager Instance Patient ID DICOM ID DICOM ID URL – DICOM ID IBM Software Group Grid Development – Phase 0 to Phase1 UED UCL KCL CHU Data Loader Admin Viewer Client Layer Grid Layer WORKLIST Deploy Grid Layer DB2 OGSA DAI Data Layer CM OGSA DAI QUERY RETRIEVE DB2 DB2 FED OGSA DAI CMCM FED OGSA DAI DB2 CM CM Fed. DB2 Fed. DB2 CM DB2 CM DB2 CM IBM Software Group CM Grid enablement – What it means OGSA-DAI conf/ext points Driver Class, e.g. com.ibm.db2.jcc.DB2Driver Driver URI, e.g. jdbc:db2://localhost:50000/SAMP LE Connection DriverManager.getConnection() Metadata Mapping to CM Datastore object, e.g com.ibm.mm.sdk.server.DKDatastoreICM Data store name, e.g. ICMNLSDB Connected Datastore Datastore.connect() Metadata Table Schema for SQL ItemTyes and Attributes XML schema for XML DB Could it be treated as an XML DB? Mapping of Grid Certificates to DB user and password Mapping of Grid Certificate to CM user and password It was possible to map CM concepts to corresponding JDBC concepts that are exposed in OGSA-DAI configuration files 2 XML files to edit and 2 Java classes to write IBM Software Group The Gory details IBM Software Group Lessons Learnt OGSA-DAI is a flexible framework into which CM fits reasonably well Chaining of activities User defined activities Developer focus on writing activities Use of dynamic discovery to configure the system Useful during development/testing Register more in the registry Unifies the view of the system as far as data is concerned Experience of grid-enabling an existing product Have not explored how to expose CM metadata yet IBM Software Group Thank You Manfred Oevers manfred_oevers@uk.ibm.com IBM Software Group Data Load - High Level Design Load Client DICOM Parser Load API LoadPlugin for Core DB 1. 2. 3. 4. 5. DICOM file gets parsed XML file created with Reference XML file passed to load services CM pulls DICOM file in As simple as possible DICOM File (Image or SR) Reference LoadPlugin for Core Store XML File Grid Boundarry Invocation Invocation Pull from Reference OGSA-DAI CM Service OGSA-DAI DB2 Service IBM Software Group Data Load Detailed Design • Plugin Architecture • Decoupling • Configuration of Plugin to decide • Parser also pluggable • API as simple as possible OUCL IBM IBM Software Group eDiaMoND API IBM Software Group eDiaMoND - Organisation Development (OUCL) Oxford / Churchill Edinburgh Aberdeen eDiaMoND LAN eDiaMoND LAN eDiaMoND LAN eDiaMoND LAN VPN & FW VPN & FW VPN & FW VPN & FW OUCL LAN Oxford LAN Edinburgh LAN Aberdeen LAN JANET Network IBM LAN Mirada LAN UCL LAN KCL LAN VPN & FW VPN & FW VPN & FW VPN & FW eDiaMoND LAN eDiaMoND LAN eDiaMoND LAN eDiaMoND LAN Development (IBM) Development (Mirada) UCL / St Georges KCL / Guys Grid Boundary Server Workstation T221 IBM Software Group Federation setup DB2 DB=FEDCORE Create view over union of Node=edibm View cis.patient = edibm.patient nicknames of identical tables union edouc.patient No query rewrite necessary DB=EDCORE Server = edibm Server = edouc Nickname= Nickname= edibm.patient edouc.patient DB=EDCORE Node=edibm Node=edouc Table=cis.patient Table=cis.patient IBM Software Group The M Diagram IBM Software Group eDiaMoND – Non-Functional Anonymisation Grid Screening Screening Screening Diagnosis Diagnosis Screening Diagnosis Teaching Teaching Teaching Training Epidemiology Epidemiology Epidemiology Epidemiology Ethics Legal Security Performance Scalability Manageability Auditability …… Lossless Compression Encryption 256MB & 5 secs response ~100 Centres Systems Administration Non-Repudiation IBM Software Group Phase 1 Deployment GEO T221 Digit. Digitiser W/S eDiaMoND Dev. W/S MIR T221 eDiaMoND LAN eDiaMoND LAN OUCL LAN IBM Dev. Grid Node Digit. SCO eDiaMoND Demo Grid Node IBM T221 T221 eDiaMoND Demo. W/S eDiaMoND Demo. Grid Node eDiaMoND W/S Digit. T221 T221 eDiaMoND Grid Node KCL GUY LAN Digitiser W/S eDiaMoND Test Grid Node eDiaMoND Demo W/S IBM LAN T221 eDiaMoND W/S UED LAN T221 eDiaMoND Repository Server CHU LAN UED eDiaMoND Dev. Grid Node T221 JANET / Internet UCL eDiaMoND Grid Node T221 JANET / Internet eDiaMoND Grid Node Digitiser W/S OUCL UCL LAN eDiaMoND W/S CHU eDiaMoND Dev. Grid Node MIR LAN T221 T221 eDiaMoND Grid Node eDiaMoND W/S Digitiser W/S T221 T221 Digit. GUY IBM Software Group UK Breast Screening – Challenges Digital Digital 2,000,000 - Screened every Year 120,000 - Recalled for Assessment 10,000 - Cancers 1,250 - Lives Saved 230 - Radiologists (Double Reading) 50% - Workload Increase Began in 1988 Women 50-70 Screened Every 3 Years 2 Views/Breast + Demographic Increase ~100 Breast Screening Programmes - Scotland - Wales - Northern Ireland - England IBM Software Group Breast Cancer Facts 1 in 8 women will develop breast cancer in the course of their lives, 1 in 28 will die of it In the EC breast cancer accounts for 19% of cancer deaths and 24% of cancer cases Diagnosed in 348,000 women in EC+USA and kills 115,000 women annually 1,000,000 new cases world-wide in 1997 Rationale for Screening Early diagnosis = better Prognosis Detection at 0.5cm has favourable outcome in 99% cases; but at 2cm only 50% IBM Software Group UK Breast Screening Programme The Recall rate is 86 for First Time Screening as no comparison is possible with a previous Screening Missed 1 Call Screening 1000 Interval Cancers Recall Assessment 40 (86) Cancer 6 Previous All Clear 960 (914) Current All Clear 34 (80) Epidemiology ~100 Breast Screening Programmes Training IBM Software Group Project Teams Grid Infrastructure Team IBM Oxford University Computing Laboratory Image Analysis Technology Team Dept of Engineering Science Mirada Solutions Image Collection & Clinical Assessment Team St Georges Hospital Guy’s and St Thomas’ Hospitals Oxford Radcliffe Hospitals Kings College London University College London University of Edinburgh IBM Software Group SMF® - Mirada’s Patented Standardisation Process Mammograms have very different appearances, depending on image settings and acquisition systems The “interesting tissue” representation is a surface independent of scanner IBM Software Group Mirada’s Interesting Tissue Representation Tumour Compression Plates Glandular Tissue Fatty Tissue 1cm Hint 1.0 cm A quantitative representation of breast tissue density