Data management in biomedical sciences Richard Appleby

advertisement
IBM Healthcare & Life Sciences
Data management in
biomedical sciences
Richard Appleby
Solution Architect, Grid Computing, IBM UK Ltd
© Copyright IBM Corporation 2004
IBM Healthcare & Life Sciences
Agenda
Positioning
 Trends & Industry drivers
Commercial solutions
 Deployed solutions
Problems
 Areas of IBM research
UK eScience involvement
 Why do we do it?
 Example collaboration
Page 2
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
Agenda
Positioning
 Trends & Industry drivers
Commercial solutions
 Deployed solutions
Problems
 Areas of IBM research
UK eScience involvement
 Why do we do it?
 Example collaboration
Page 3
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
Phenomenal growth in data and computation
 The volume of life sciences data in the average
biotech company is doubling every six months.
 32,000 genes (at least 3 billion nucleotide letters)
in the human genome
 Worldwide, 150 Petabytes of medical images
created annually if stored electronically
 1 Mass Spectrometer generates 250Gigabytes of
spectral data per day. Many farms generate
>20TB of new data per day.
 13x1021 floating point operations (FLOPS) are
required to fold a single protein
Page 4
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
Most pronounced in “fixed content” data
Fixed content data




does not change
is actively referenced
is retained for long periods of time (often for legal reasons)
Medical Images, Electronic Documents, Emails, Audio and Video records
Huge growth in fixed content data volumes



Driven by movement from film to digital imaging1:
• 308,000 TBs created in 2003
• Predicting 1,250,000 TBs in 2006
By 2006, fixed content will represent more than 50% off all online storage2
Growth rate of fixed content 3x non-fixed content over 5 year period2
Medical Imaging generates more fixed content than any other industry
Estimate fixed content storage utilization is lower than 50%
Sources: (1) Yankee Group, 2003, (2) Enterprise Storage Group, 2002
Page 5
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
Personalised Medicine is Information Based
Changing requirements
• Volume and complexity of data
• Integrating massive volumes of
disparate data
• Need for sophisticated analytics
• Growing collaboration across
ecosystem
1. Patient
Information
Hospital events ....admission,
surgery, recovery, discharge
Access to Diverse
Heterogeneous
Distributed Data
Expression Arrays
(various tissues)
Personal
genomics
X-rays, MRI,
mamograms,
etc
Clinical Record
Page 6
Analysis
lab notes
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
But personalised medicine needs to solve the data problem:
Fragmentation of existing data resources and assets
• Multiple organisations
• Multiple repositories
• Heterogeneous environment(s)
Cumbersome data access and poor integration
•
•
Multiple access methods
Different schemas, formats, etc
Data security and protection
Complex management of de-centralised systems and resources
Inflexible and difficult to scale or change
High total costs
•
Page 7
Under-utilised compute and storage resources
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
Agenda
Positioning
 Trends & Industry drivers
Commercial solutions
 Deployed solutions
Problems
 Areas of IBM research
UK eScience involvement
 Why do we do it?
 Example collaboration
Page 8
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
Medical Imaging Grid
•
•
•
•
•
8 PACS Systems
200 workstations and 1000 + users
10 remote hospitals across the province
24 Terabytes of Storage
Interfaces with legacy imaging and HIS
systems and workstations (DICOM)
Page 9
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
Universal Access Layer – Getting to the Patient Case
The Access Layer authenticates & authorises access to the Grid Archive
Key Services & Features
1. Multi-Location: Works
seamlessly Over LAN or WAN
2. Open: Supports multiple
PACS and applications
3. Full User Abstraction:
Integration via multiple APIs
(NFS, CIFS, DICOM, HTTP,
Custom API )
4. Secure & HIPAA-compliant
archival and transmission
5. Fast access via caching and
P2P techniques
Universal access to distributed heterogeneous images as if they were local
Page 10
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
Grid Management Layer – Accessing the Patient Case
The Management Layer controls data storage & retrieval based on business rules and/or
service level agreements.
Key Services & Features
1. Automated image
registration, copying,
and indexing of data
and metadata
2. Images can be
replicated locally or
copied throughout the
Grid
3. Centralised
administration with
proactive monitoring
and alerts
Intelligent placement of data based on business rules with central management
Page 11
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
Archive Management Layer – Saving the Patient Case
The Archive Management Layer Enables Safe, Cost-Effective, Always-Available Storage
Key Services & Features
1. Supports heterogeneous
hardware and Online, nearline and Offline media
2. Intelligent HSM/ILM, based
on file content
3. Automatic fail-over in case
of failure or maintenance
4. Proactive monitoring and
automated self-healing via
digital signatures
5. HW obsolescence protection
6. Petabyte scalability
Stores images intelligently to maximize performance, cost, & ensure data redundancy
Page 12
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
Aventis Pharmaceuticals
Information/Collaboration Grid
Challenge
• Distributed, diverse data sources across
multiple continents
• Data was Heterogeneous, Cross Platform,
consisting of Files and Databases
• Limited ability to consolidate, construct and
analyze data sets
Solution
• Linux
• IBM
• IBM WebSphere Information Integrator
Key Business Benefits
• Using IBM WebSphere Information Integrator
to bring together disparate LS data sources in
one coherent view
• Significant increase in researcher productivity
due to improved collaboration & data access
• Better data quality and currency
Page 13
Information Integrator
Integrating diverse Life Sciences
information
across and beyond the enterprise
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
“Major Pharmaceutical Company”
Non R&D Server Compute Grid
Challenge: Increase the reach and power
of marketing analytics
Business Issues
• Product information (Prescription, Sales,
•
Segmentation) not being effectively collected
External research upload processes currently takes
three plus weeks, dating information
Business Benefits
• Reduced time for process to hours and days rather
•
than three weeks without new solution
Provided marketing and product planning with
timely, useful information to improve business
decisions
Chosen Grid Infrastructure
• United Devices Grid Meta processor which links
•
PCs and Servers in a virtual grid environment
IBM is providing project management and technical
support for the engagement.
Page 14
Technology Benefits:
 More flexible and resilient environment
 Open standards allow easy integration
with other data sources and addition of
new functions
Grid Computing Business Benefits:
 Higher projected sales due to more
effective targeting of prospective
customers
 Lower cost due to server
consolidation, lower license fees, less
maintenance
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
Agenda
Positioning
 Trends & Industry drivers
Commercial solutions
 Deployed solutions
Problems
 Areas of IBM research
UK eScience involvement
 Why do we do it?
 Example collaboration
Page 15
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
The data (virtualisation) problem again:
Fragmentation of existing data resources and assets
• Multiple organisations
• Multiple repositories
• Heterogeneous environment(s)
Cumbersome data access and poor integration
•
•
Multiple access methods
Different schemas, formats, etc
Data security and protection
Complex management of de-centralised systems and resources
Inflexible and difficult to scale or change
High total costs
•
Page 16
Under-utilised compute and storage resources
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
Requirements for data virtualisation solutions
Data virtualisation solutions should :
Materialize data, in any format, at any location, from one or several sources,
with appropriate security, performance, consistency, and coherency.
Transparent distributed access to data
 (Automatic) caching / replication
Global naming of distributed data
 Consistent method for referring to data sets across the grid
Data movement
 Reliable and high performance
Data transformation and federation
 Transform data from one format or schema to another
 Combining data sources as required
Quality of data
 Manage consistency, coherency, completeness and provenance of data
Page 17
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
On Demand Data Placement
What is this project trying to do?
 Develop an advisor for on demand data placement over multiple distributed
data sources. The advisor monitors application execution patterns globally
to derive a data placement strategy that yields desired throughput and
response time for a query workload
Why is it important?
 Placing data autonomically for a query workload in order to meet throughput
and response time goals in a distributed environment system is key to
achieving policy driven data access. Today data placement is typically done
manually.
What are concrete deliverables?
 A data placement advisor prototype supporting a variety of data source
configurations and minimizing query global response time
How will IBM (Grid and on demand) customers benefit?
 Autonomic data access application performance improvement by
recommending preferred data placement to yield desired response time and
throughput for query workloads
Page 18
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
On Demand Data Placement
Q1: Select * from T1, T2 where T1.a1=T2.a2
Queries
Q2: Select * from T2, T3 where T2.a2=T3.a3
Information
Integrator
Meta-Wrapper
Wrapper
Wrapper
Log:
Query statements
Response time
Execution Plan
Data placement criteria:
( Benefit – Overhead ) / Table Size
Data
Placement
Advisor
Metadata
Repository
Wrapper Wrapper
Options:
1. replicate T2 to source 1
2. replicate T1 to source 2
Replication
Manager
?
T3
T1
Source 1
Page 19
T2
T2
T1
T4
T5
Source 3
Source 4
?
Source 2
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
Resource Namespace Service
What is this project trying to do?
 Provides a federated and uniform view of virtualized grid resources presented
in a human oriented hierarchy
Why is it important?
 There is a need to access resources within a distributed network or grid by
way of a universal name that is convenient for human interface applications
What are concrete deliverables?
 A Web Service specification document for namespace services within the Grid
 IETF specification draft for NFSv4
How will IBM (Grid and on demand) customers benefit?
 RNS is a namespace service that enables:
 Access to information and services via a variety of interfaces
 Federates an extensible diversity of data sources and services
 Location transparency
 Global naming and scalability
Page 20
© Copyright IBM Corporation 2005
Applying Provenance to Data


Data can be created and changed during
its lifetime
Events can trigger the changes





Doctor’s update Patient Records
Pharmacists update Dispensary Records
Prescription
We can record documentation about
processes that created data and then ask
questions



A Patient makes an appointment that
results in an entry in a calendar
A Pharmacist receives a Prescription and
Dispenses a drug
Actors manage their own state

Patient
Did the Doctor follow the correct process
when he Prescribed Viagra for Joe Bloggs
?
Have the Government guidelines on
waiting lists for GP appointments been
breached by this practice ?
Security and Privacy


5/31/2016
Prescription
Appointment
Doctor
The Doctor cannot see the Pharmacist’s
processes and vice versa
The Patient can see parts of both
iAnalytics | Confidential
Dispense
Pharmacist
Patient
Record
Dispensary
Record
21
Recording Provenance
Service
Requester
Service Request
Service
Provider
Provenance
Store
Record Interaction
Actor State
Service Response
Record Interaction
Submission Finished
An Actor can be either a Service Requester, Service Provider or any
application that uses the Provenance Service
Actors assert information about Provenance; called a p-assertion
Interaction p-assertion = Service Request and Response
Actor state p-assertion = Any state information provided by Service provider
5/31/2016
iAnalytics | Confidential
22
IBM Healthcare & Life Sciences
Agenda
Positioning
 Trends & Industry drivers
Commercial solutions
 Deployed solutions
Problems
 Areas of IBM research
UK eScience involvement
 Why do we do it?
 Example collaboration
Page 23
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
So, why is IBM involved in eScience research?
IBM believes that a consortium approach within Life Sciences and
Healthcare, involving both partners and academia is a healthy way to
research the application of Grid and ICT to their challenges.
• Collaborations lead to innovation
•
•
R&D and investments in grid and related technologies
Improved commercial solutions
• Identifies areas for investment
•
•
Ensures continued “best of breed” offerings and solutions
Refines product development roadmaps
• Relationship building
•
•
industry-leading academics and partners
Builds an ecosystem
• Breeding ground for top talent
•
Page 24
Ours, and the wider community
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
Unlocking IBM’s Intellectual Capital
Patents essential to commercial entities
 Forms a vital revenue stream
 Protects the investment in corporate research
However
 technological advances depend on shared knowledge, standards
and collaboration
 IBM believes open software standards lead to greater efficiency and
innovation
IBM would like to take a balanced approach to IP management, enabling both proprietary
and open models while continuing to protect truly new and useful inventions. In this way
IBM can invest in specific industries, helping them to grow and innovate.
On the 24th October, IBM announced that it will pledge royalty-free access to our
patent portfolio for the development and implementation of selected open software
standards for the healthcare arena, built around web services, electronic forms and
open document formats.
http://www.ibm.com/research/innovation/ip
Page 25
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
eHealthcare@Home
The project will integrate
invasive and non-invasive
patient monitoring
systems with the
gathering and analysis of
this information via a
Grid-based infrastructure.
Individualized data will be
collected from multienterprise specialist
equipment and
databases, combined with
monitoring data from
remote patients,
employing a new class of
dedicated home
healthcare server.
Page 26
Imaging & Diagnostics
Other in-region hospitals
Regional
Health Authority
Transfer
handler
MRI
Patch
Clamp
Metabolic
Retinopathy
Analysis
Transfer
engines
Trend and analysis engines
RHA data-stores
In-home patient
Monitoring
(Sensors and
home-servers)
PAC Systems
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
World Community Grid
A global, philanthropic compute grid established by IBM to
leverage grid computing resources to help expedite
calculations, normally requiring years, and produce results
in mere months. Currently 156,000 systems have
contributed over 19,000 processor years.
An advisory Board with members from leading foundations,
universities and public organizations provides oversight to
the research projects. Projects in the following disciplines
will be considered for inclusion:
• Medical Research – Genomics, proteomics, epidemiology, and
biological system research such as AIDS and HIV studies.
• Environmental Research – Ecology, climatology, pollution, and
preservation
• Basic Research – Human health and welfare related studies
To contribute resources, or to suggest research, please
visit http://www.worldcommunitygrid.org/
Page 27
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
Any Questions?
Ask now, find me later today for a chat, or email me later on
appleby@uk.ibm.com
Thank you!
Page 28
© Copyright IBM Corporation 2005
IBM Healthcare & Life Sciences
Backup Foils
Page 29
© Copyright IBM Corporation 2005
Download