A Data Intensive High Performance Simulation & Visualization Framework for Disease Surveillance Arif Ghafoor, David Ebert, Madiha Sahar Ross Maciejewski, Shehzad Afzal, Farrukh Arslan Acknowledgement: Project Partially Funded by Cyber Center Objective and Goals Objective: To address the infectious disease surveillance challenges and develop a collaborative capability for all the stakeholders for monitoring and managing outbreaks infectious diseases in large cities Approach: Develop a high performance computing (HPC) framework employing robust and novel infectious disease epidemiology models with real-time inference and pre/exercise planning capabilities. Objective and Goals Real-time data analysis capabilities, providing a model for infrastructure development where lessons learned can be used to develop best practice models A comparative assessment of disease modeling techniques by focusing on the tradeoff between the level of granularity used in creating the model and the model efficacy Novel visual analytics paradigms integrating decision support and resource allocation tools with live streaming data and disease simulation scenarios Conceptual view of Proposed Infectious Disease Surveillance Framework 4 Tasks: Task A: Data Intensive MultiResolution Simulation Modeling Task B: High Performance Simulation Modeling on HADOOP Task A: Initial Research Results Challenge: The notion of context, is important for syndromic surveillance. For syndromic data set we need: Contextual attributes Behavioral attributes We have proposed an HPC data mining framework for contextual and behavioral attributes using Syndrome Ontology (Assumption: Domain Knowledge is available) Currently pursuing system Implementation -WEKA: Machine Learning & Data Mining in Java. (http://www.cs.waikato.ac.nz/ml/weka/index.html) Task A: Data Intensive Multi-Resolution Simulation Modeling (initial results) • Proposed HPC framework for mining of contextual (eg. spatio-temporal) and behavioral attributes using Syndrome Ontology. • Domain knowledge is available through domain ontology 7 Ontological Syndromic and Climate Classifiers Exploration towards decision trees spanning over distributed multi-domains, representing semantic knowledge at temporal, spatial and socio-economic level. 8 Patien t ID Date Age Gende r Location Chief Complaint s 9398 1/10/11 20 Female Kot Begum Flu 10816 1/14/11 24 Male Faisal Park Chills 1491 1/27/11 28 Male Bhamman Bodyaches 16237 2/1/11 20 Female Chah Miran Anxiety CoCo Classifier 9 Epidemic Spread Visualization 10 Developing Novel Statistical Heterogeneous Agent Based SIR Model • • • • Adding age based and gender based classification Demographic impacts on spread rate (socioeconomic classification) Capturing seasonal trends of disease spread Effect of decision making considering preventive measures (inoculation of population, resource allocation of healthcare) 11 Components of Proposed HPC HADOOP Platform Visual Analytics Environment for Web-Based Real-time HPC Distributed Services Real-time networked data streams Data Filtering, Anonymization and Ingestation Decision Support Sub-System Real-Time On-Line Data Mining for Syndromic Surveillance ID Spreadness Simulation Statistical Data Analytics for Health Forecast Service Development Platform Cloud Platform as a Service + Support services (Storage, DB, Security, Aggregation) Multi-Tenant, Deployment & Cloud Cluster Management Virtualization Cloud Infrastructure Management Hadoop HPC Hardware Component Legend Hadoop HPC Infrastructure ID Databank Open-source Partially developed To be developed 12 Task B: High Performance Simulation Modeling on HADOOP (in progress) Objective: Development of agent-based and multi-granularity homogenous mixing model for HPC-based simulation. TASK B: High Performance Simulation Modeling on HADOOP Development of Agent-Based SIR Model for Heterogeneous Networks Simulation Based Disease Spread Behavior Analysis of Decision making for Preventive Measures SIR IN HETEROGENEOUS NETWORKS Each node can have three states: Susceptible, Infected, and Recovered (S, I, R) Once infected, a node can transmit infection to neighboring susceptible nodes with a probability β Infected nodes stay infected for a duration d Recovery rate of infected nodes υ is 1/d Susceptibility of an individual may vary depending upon the number of infected neighbors Within a group interaction: β: probability of getting disease during a contact d: duration of infection υ: Recovery Rate ( 1/d) N: Total Population Figure: State diagram of SIR SOCIAL NETWORK MODELING FOR PREDICTION & MANAGEMENT OF EPIDEMICS Development of an Agent Based social networking model to simulate the infectious disease spread Population is divided into groups depending upon age, gender, occupation, and location – a phenomenon known as Assortative Mixing Distribution of contacts play a key role in determining the onset of expansion phase of epidemic Population Classification Attributes HETEROGENEOUS GRAPH MODEL FOR MULTI-GROUP POPULATION INTERCATION CURRENT STATUS Development of Heterogeneous Models & evaluation of their fidelity. Simulation in NETLOGO Simulation Objectives Effect of demographic properties Effect of weather on epidemic disease spread and seasonal trends Effect of pharmaceutical and other decision measures on epidemic spread Summary and Status Proposed an HPC-based data mining framework for contextual and behavioral attributes using Syndrome Ontology (Assumption: Domain Knowledge is available). Currently pursuing system Implementation --WEKA: Machine Learning & Data Mining in Java. Development of agent-based SIR heterogeneous population model for HPC-based simulation for large cities (in progress). Proposal (in preparation): Gates Foundation Grand Challenges Explorations for Global Health Potential collaboration with MSR