A Data Intensive High Performance Simulation & Visualization Framework for Disease Surveillance

advertisement
A Data Intensive High Performance Simulation
& Visualization Framework for Disease
Surveillance
Arif Ghafoor, David Ebert, Madiha Sahar
Ross Maciejewski, Shehzad Afzal, Farrukh Arslan
Acknowledgement: Project Partially Funded by Cyber Center
Objective and Goals


Objective: To address the infectious disease surveillance
challenges and develop a collaborative capability for all the
stakeholders for monitoring and managing outbreaks
infectious diseases in large cities
Approach: Develop a high performance computing (HPC)
framework employing robust and novel infectious disease
epidemiology models with real-time inference and
pre/exercise planning capabilities.
Objective and Goals



Real-time data analysis capabilities, providing a model
for infrastructure development where lessons learned
can be used to develop best practice models
A comparative assessment of disease modeling
techniques by focusing on the tradeoff between the level
of granularity used in creating the model and the model
efficacy
Novel visual analytics paradigms integrating decision
support and resource allocation tools with live streaming
data and disease simulation scenarios
Conceptual view of Proposed Infectious
Disease Surveillance Framework
4
Tasks:


Task A: Data Intensive MultiResolution Simulation Modeling
Task B: High Performance Simulation
Modeling on HADOOP
Task A: Initial Research Results
Challenge: The notion of context, is important for
syndromic surveillance. For syndromic data set we
need:
 Contextual attributes
 Behavioral attributes
 We have proposed an HPC data mining framework
for contextual and behavioral attributes using
Syndrome Ontology (Assumption: Domain
Knowledge is available)
 Currently pursuing system Implementation -WEKA: Machine Learning & Data Mining in Java.

(http://www.cs.waikato.ac.nz/ml/weka/index.html)
Task A: Data Intensive Multi-Resolution
Simulation Modeling (initial results)
• Proposed HPC framework for mining of contextual (eg. spatio-temporal) and
behavioral attributes using Syndrome Ontology.
• Domain knowledge is available through domain ontology
7
Ontological Syndromic and Climate Classifiers
Exploration towards decision trees spanning over distributed multi-domains,
representing semantic knowledge at temporal, spatial and socio-economic level.
8
Patien
t ID
Date
Age
Gende
r
Location
Chief
Complaint
s
9398
1/10/11
20
Female
Kot Begum
Flu
10816
1/14/11
24
Male
Faisal Park
Chills
1491
1/27/11
28
Male
Bhamman
Bodyaches
16237
2/1/11
20
Female
Chah Miran
Anxiety
CoCo
Classifier
9
Epidemic Spread Visualization
10
Developing Novel Statistical Heterogeneous
Agent Based SIR Model
•
•
•
•
Adding age based and gender based classification
Demographic impacts on spread rate (socioeconomic classification)
Capturing seasonal trends of disease spread
Effect of decision making considering preventive measures
(inoculation of population, resource allocation of healthcare)
11
Components of Proposed HPC HADOOP Platform
Visual Analytics Environment for Web-Based Real-time HPC Distributed Services
Real-time
networked data
streams
Data Filtering,
Anonymization
and Ingestation
Decision Support Sub-System
Real-Time
On-Line Data Mining
for Syndromic
Surveillance
ID Spreadness
Simulation
Statistical Data
Analytics
for Health
Forecast
Service Development Platform
Cloud Platform as a
Service + Support
services (Storage,
DB, Security,
Aggregation)
Multi-Tenant, Deployment & Cloud Cluster
Management
Virtualization Cloud Infrastructure
Management
Hadoop
HPC Hardware
Component Legend
Hadoop
HPC
Infrastructure
ID
Databank
Open-source
Partially developed
To be developed
12
Task B: High Performance Simulation
Modeling on HADOOP (in progress)

Objective: Development of agent-based and multi-granularity
homogenous mixing model for HPC-based simulation.
TASK B: High Performance Simulation
Modeling on HADOOP

Development of Agent-Based SIR Model for
Heterogeneous Networks

Simulation Based Disease Spread Behavior

Analysis of Decision making for Preventive
Measures
SIR IN HETEROGENEOUS NETWORKS






Each node can have three states: Susceptible, Infected, and
Recovered (S, I, R)
Once infected, a node can transmit infection to neighboring
susceptible nodes with a probability β
Infected nodes stay infected for a duration d
Recovery rate of infected nodes υ is 1/d
Susceptibility of an individual may vary depending upon the number
of infected neighbors
Within a group interaction:
β: probability of getting disease during
a contact
d: duration of infection
υ: Recovery Rate ( 1/d)
N: Total Population
Figure: State diagram of SIR
SOCIAL NETWORK MODELING FOR
PREDICTION & MANAGEMENT OF
EPIDEMICS



Development of an Agent Based social
networking model to simulate the infectious
disease spread
Population is divided into groups depending
upon age, gender, occupation, and location – a
phenomenon known as Assortative Mixing
Distribution of contacts play a key role in
determining the onset of expansion phase of
epidemic
Population Classification Attributes
HETEROGENEOUS GRAPH MODEL FOR
MULTI-GROUP POPULATION INTERCATION
CURRENT STATUS


Development of Heterogeneous Models &
evaluation of their fidelity. Simulation in
NETLOGO
Simulation Objectives



Effect of demographic properties
Effect of weather on epidemic disease spread
and seasonal trends
Effect of pharmaceutical and other decision
measures on epidemic spread
Summary and Status



Proposed an HPC-based data mining framework for
contextual and behavioral attributes using Syndrome
Ontology (Assumption: Domain Knowledge is available).
Currently pursuing system Implementation --WEKA:
Machine Learning & Data Mining in Java.
Development of agent-based SIR heterogeneous
population model for HPC-based simulation for large
cities (in progress).
Proposal (in preparation):


Gates Foundation Grand Challenges Explorations for Global
Health
Potential collaboration with MSR
Download