IBM Healthcare & Life Sciences Data management in biomedical sciences Richard Appleby Solution Architect, Grid Computing, IBM UK Ltd © Copyright IBM Corporation 2004 IBM Healthcare & Life Sciences Agenda Positioning Trends & Industry drivers Commercial solutions Deployed solutions Problems Areas of IBM research UK eScience involvement Why do we do it? Example collaboration Page 2 © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences Agenda Positioning Trends & Industry drivers Commercial solutions Deployed solutions Problems Areas of IBM research UK eScience involvement Why do we do it? Example collaboration Page 3 © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences Phenomenal growth in data and computation The volume of life sciences data in the average biotech company is doubling every six months. 32,000 genes (at least 3 billion nucleotide letters) in the human genome Worldwide, 150 Petabytes of medical images created annually if stored electronically 1 Mass Spectrometer generates 250Gigabytes of spectral data per day. Many farms generate >20TB of new data per day. 13x1021 floating point operations (FLOPS) are required to fold a single protein Page 4 © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences Most pronounced in “fixed content” data Fixed content data does not change is actively referenced is retained for long periods of time (often for legal reasons) Medical Images, Electronic Documents, Emails, Audio and Video records Huge growth in fixed content data volumes Driven by movement from film to digital imaging1: • 308,000 TBs created in 2003 • Predicting 1,250,000 TBs in 2006 By 2006, fixed content will represent more than 50% off all online storage2 Growth rate of fixed content 3x non-fixed content over 5 year period2 Medical Imaging generates more fixed content than any other industry Estimate fixed content storage utilization is lower than 50% Sources: (1) Yankee Group, 2003, (2) Enterprise Storage Group, 2002 Page 5 © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences Personalised Medicine is Information Based Changing requirements • Volume and complexity of data • Integrating massive volumes of disparate data • Need for sophisticated analytics • Growing collaboration across ecosystem 1. Patient Information Hospital events ....admission, surgery, recovery, discharge Access to Diverse Heterogeneous Distributed Data Expression Arrays (various tissues) Personal genomics X-rays, MRI, mamograms, etc Clinical Record Page 6 Analysis lab notes © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences But personalised medicine needs to solve the data problem: Fragmentation of existing data resources and assets • Multiple organisations • Multiple repositories • Heterogeneous environment(s) Cumbersome data access and poor integration • • Multiple access methods Different schemas, formats, etc Data security and protection Complex management of de-centralised systems and resources Inflexible and difficult to scale or change High total costs • Page 7 Under-utilised compute and storage resources © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences Agenda Positioning Trends & Industry drivers Commercial solutions Deployed solutions Problems Areas of IBM research UK eScience involvement Why do we do it? Example collaboration Page 8 © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences Medical Imaging Grid • • • • • 8 PACS Systems 200 workstations and 1000 + users 10 remote hospitals across the province 24 Terabytes of Storage Interfaces with legacy imaging and HIS systems and workstations (DICOM) Page 9 © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences Universal Access Layer – Getting to the Patient Case The Access Layer authenticates & authorises access to the Grid Archive Key Services & Features 1. Multi-Location: Works seamlessly Over LAN or WAN 2. Open: Supports multiple PACS and applications 3. Full User Abstraction: Integration via multiple APIs (NFS, CIFS, DICOM, HTTP, Custom API ) 4. Secure & HIPAA-compliant archival and transmission 5. Fast access via caching and P2P techniques Universal access to distributed heterogeneous images as if they were local Page 10 © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences Grid Management Layer – Accessing the Patient Case The Management Layer controls data storage & retrieval based on business rules and/or service level agreements. Key Services & Features 1. Automated image registration, copying, and indexing of data and metadata 2. Images can be replicated locally or copied throughout the Grid 3. Centralised administration with proactive monitoring and alerts Intelligent placement of data based on business rules with central management Page 11 © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences Archive Management Layer – Saving the Patient Case The Archive Management Layer Enables Safe, Cost-Effective, Always-Available Storage Key Services & Features 1. Supports heterogeneous hardware and Online, nearline and Offline media 2. Intelligent HSM/ILM, based on file content 3. Automatic fail-over in case of failure or maintenance 4. Proactive monitoring and automated self-healing via digital signatures 5. HW obsolescence protection 6. Petabyte scalability Stores images intelligently to maximize performance, cost, & ensure data redundancy Page 12 © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences Aventis Pharmaceuticals Information/Collaboration Grid Challenge • Distributed, diverse data sources across multiple continents • Data was Heterogeneous, Cross Platform, consisting of Files and Databases • Limited ability to consolidate, construct and analyze data sets Solution • Linux • IBM • IBM WebSphere Information Integrator Key Business Benefits • Using IBM WebSphere Information Integrator to bring together disparate LS data sources in one coherent view • Significant increase in researcher productivity due to improved collaboration & data access • Better data quality and currency Page 13 Information Integrator Integrating diverse Life Sciences information across and beyond the enterprise © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences “Major Pharmaceutical Company” Non R&D Server Compute Grid Challenge: Increase the reach and power of marketing analytics Business Issues • Product information (Prescription, Sales, • Segmentation) not being effectively collected External research upload processes currently takes three plus weeks, dating information Business Benefits • Reduced time for process to hours and days rather • than three weeks without new solution Provided marketing and product planning with timely, useful information to improve business decisions Chosen Grid Infrastructure • United Devices Grid Meta processor which links • PCs and Servers in a virtual grid environment IBM is providing project management and technical support for the engagement. Page 14 Technology Benefits: More flexible and resilient environment Open standards allow easy integration with other data sources and addition of new functions Grid Computing Business Benefits: Higher projected sales due to more effective targeting of prospective customers Lower cost due to server consolidation, lower license fees, less maintenance © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences Agenda Positioning Trends & Industry drivers Commercial solutions Deployed solutions Problems Areas of IBM research UK eScience involvement Why do we do it? Example collaboration Page 15 © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences The data (virtualisation) problem again: Fragmentation of existing data resources and assets • Multiple organisations • Multiple repositories • Heterogeneous environment(s) Cumbersome data access and poor integration • • Multiple access methods Different schemas, formats, etc Data security and protection Complex management of de-centralised systems and resources Inflexible and difficult to scale or change High total costs • Page 16 Under-utilised compute and storage resources © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences Requirements for data virtualisation solutions Data virtualisation solutions should : Materialize data, in any format, at any location, from one or several sources, with appropriate security, performance, consistency, and coherency. Transparent distributed access to data (Automatic) caching / replication Global naming of distributed data Consistent method for referring to data sets across the grid Data movement Reliable and high performance Data transformation and federation Transform data from one format or schema to another Combining data sources as required Quality of data Manage consistency, coherency, completeness and provenance of data Page 17 © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences On Demand Data Placement What is this project trying to do? Develop an advisor for on demand data placement over multiple distributed data sources. The advisor monitors application execution patterns globally to derive a data placement strategy that yields desired throughput and response time for a query workload Why is it important? Placing data autonomically for a query workload in order to meet throughput and response time goals in a distributed environment system is key to achieving policy driven data access. Today data placement is typically done manually. What are concrete deliverables? A data placement advisor prototype supporting a variety of data source configurations and minimizing query global response time How will IBM (Grid and on demand) customers benefit? Autonomic data access application performance improvement by recommending preferred data placement to yield desired response time and throughput for query workloads Page 18 © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences On Demand Data Placement Q1: Select * from T1, T2 where T1.a1=T2.a2 Queries Q2: Select * from T2, T3 where T2.a2=T3.a3 Information Integrator Meta-Wrapper Wrapper Wrapper Log: Query statements Response time Execution Plan Data placement criteria: ( Benefit – Overhead ) / Table Size Data Placement Advisor Metadata Repository Wrapper Wrapper Options: 1. replicate T2 to source 1 2. replicate T1 to source 2 Replication Manager ? T3 T1 Source 1 Page 19 T2 T2 T1 T4 T5 Source 3 Source 4 ? Source 2 © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences Resource Namespace Service What is this project trying to do? Provides a federated and uniform view of virtualized grid resources presented in a human oriented hierarchy Why is it important? There is a need to access resources within a distributed network or grid by way of a universal name that is convenient for human interface applications What are concrete deliverables? A Web Service specification document for namespace services within the Grid IETF specification draft for NFSv4 How will IBM (Grid and on demand) customers benefit? RNS is a namespace service that enables: Access to information and services via a variety of interfaces Federates an extensible diversity of data sources and services Location transparency Global naming and scalability Page 20 © Copyright IBM Corporation 2005 Applying Provenance to Data Data can be created and changed during its lifetime Events can trigger the changes Doctor’s update Patient Records Pharmacists update Dispensary Records Prescription We can record documentation about processes that created data and then ask questions A Patient makes an appointment that results in an entry in a calendar A Pharmacist receives a Prescription and Dispenses a drug Actors manage their own state Patient Did the Doctor follow the correct process when he Prescribed Viagra for Joe Bloggs ? Have the Government guidelines on waiting lists for GP appointments been breached by this practice ? Security and Privacy 5/31/2016 Prescription Appointment Doctor The Doctor cannot see the Pharmacist’s processes and vice versa The Patient can see parts of both iAnalytics | Confidential Dispense Pharmacist Patient Record Dispensary Record 21 Recording Provenance Service Requester Service Request Service Provider Provenance Store Record Interaction Actor State Service Response Record Interaction Submission Finished An Actor can be either a Service Requester, Service Provider or any application that uses the Provenance Service Actors assert information about Provenance; called a p-assertion Interaction p-assertion = Service Request and Response Actor state p-assertion = Any state information provided by Service provider 5/31/2016 iAnalytics | Confidential 22 IBM Healthcare & Life Sciences Agenda Positioning Trends & Industry drivers Commercial solutions Deployed solutions Problems Areas of IBM research UK eScience involvement Why do we do it? Example collaboration Page 23 © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences So, why is IBM involved in eScience research? IBM believes that a consortium approach within Life Sciences and Healthcare, involving both partners and academia is a healthy way to research the application of Grid and ICT to their challenges. • Collaborations lead to innovation • • R&D and investments in grid and related technologies Improved commercial solutions • Identifies areas for investment • • Ensures continued “best of breed” offerings and solutions Refines product development roadmaps • Relationship building • • industry-leading academics and partners Builds an ecosystem • Breeding ground for top talent • Page 24 Ours, and the wider community © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences Unlocking IBM’s Intellectual Capital Patents essential to commercial entities Forms a vital revenue stream Protects the investment in corporate research However technological advances depend on shared knowledge, standards and collaboration IBM believes open software standards lead to greater efficiency and innovation IBM would like to take a balanced approach to IP management, enabling both proprietary and open models while continuing to protect truly new and useful inventions. In this way IBM can invest in specific industries, helping them to grow and innovate. On the 24th October, IBM announced that it will pledge royalty-free access to our patent portfolio for the development and implementation of selected open software standards for the healthcare arena, built around web services, electronic forms and open document formats. http://www.ibm.com/research/innovation/ip Page 25 © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences eHealthcare@Home The project will integrate invasive and non-invasive patient monitoring systems with the gathering and analysis of this information via a Grid-based infrastructure. Individualized data will be collected from multienterprise specialist equipment and databases, combined with monitoring data from remote patients, employing a new class of dedicated home healthcare server. Page 26 Imaging & Diagnostics Other in-region hospitals Regional Health Authority Transfer handler MRI Patch Clamp Metabolic Retinopathy Analysis Transfer engines Trend and analysis engines RHA data-stores In-home patient Monitoring (Sensors and home-servers) PAC Systems © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences World Community Grid A global, philanthropic compute grid established by IBM to leverage grid computing resources to help expedite calculations, normally requiring years, and produce results in mere months. Currently 156,000 systems have contributed over 19,000 processor years. An advisory Board with members from leading foundations, universities and public organizations provides oversight to the research projects. Projects in the following disciplines will be considered for inclusion: • Medical Research – Genomics, proteomics, epidemiology, and biological system research such as AIDS and HIV studies. • Environmental Research – Ecology, climatology, pollution, and preservation • Basic Research – Human health and welfare related studies To contribute resources, or to suggest research, please visit http://www.worldcommunitygrid.org/ Page 27 © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences Any Questions? Ask now, find me later today for a chat, or email me later on appleby@uk.ibm.com Thank you! Page 28 © Copyright IBM Corporation 2005 IBM Healthcare & Life Sciences Backup Foils Page 29 © Copyright IBM Corporation 2005