Towards Computational Epidemiology Designing an Infectious Disease Outbreak Simulator Armin R. Mikler Department of Computer Science and Engineering Department of Biological Sciences University of North Texas Armin R. Mikler Towards Computational Epidemiology Address broader aspects of Epidemiology Disease Tracking, Analysis, and Surveillance High Performance Computing (HPC) Simulation Data visualization. Design and implement computational tools – investigating Tuberculosis outbreaks and risk assessment in spatially delineated environments – modeling and simulating details of specific instances of Tuberculosis occurrences in North Texas – applicable to a wide variety of disease outbreaks in spatially well-defined settings Contribute towards establishing computational epidemiology as a new research domain!! Armin R. Mikler Disease Outbreak Model Local • Local – Delineated space • Factory, homeless shelter, school – Airflow – Heating and cooling – Distances in feet – Architectural properties Global • Global – Demography – Socio-economics – Travel – Transportation – Geography – Culture Armin R. Mikler Global Stochastic Cellular Automata and the SWARM Top Layer:Cellular Automata Global Middle Layer: Cellular Automata Regional Bottom Layer: SWARM Local Armin R. Mikler The Focus of Study--Locality based This study proposes to model the dynamics of tuberculosis transmission within two facilities in North Texas - a homeless shelter facility providing both long and short-term occupancy with 800 beds, and a factory. Data was previously collected through interviews during targeted surveillance screening of workers in the factory and homeless people who use the shelter. Data has been Deidentified !!! Armin R. Mikler Homeless Shelter Data and Findings For the homeless shelter, the data set comprises screening records for each case including: •Date tested (relative to t0) •Status of tuberculosis •Location in the facility •Length of time spent in the facility •Other variables Results of initial analysis suggest that TB risk is not uniformly distributed but depends on the location of the sleeping bed and duration and frequency of stay at the night shelter. Armin R. Mikler Armin R. Mikler Factory Data and Findings In addition to basic screening records as collected for the homeless shelter, other available data for the factory include measures of duration and proximity to infected person such as: • Hours per week in the factory • Hours per week in the same workspace • Hours per week within 3 feet of infected person • Usual work area. Results of initial analysis indicate that proximity of workspace to infected person was a major determinant of infection. In fact 100% of those who worked directly in the same space with one infected person were infected with the same strain of TB. Armin R. Mikler Factory Layout Armin R. Mikler The Paint Area Air vent system The Restroom The Eating Area Armin R. Mikler Modeling Approaches Agent based modeling Level of exposure Emergent behavior defined by individuals’ actions. The average number of bacilli that are emitted (through coughing, sneezing, etc.) Spatial interaction. Stochastic Cellular Automata Ambient temperature and airflow Particle Suspension and Dispersion Intrinsically stochastic. Armin R. Mikler From GIS data to Agent-Based Simulation to Visualization GIS/ Epidemiologic Data Social Interactions Particle suspension & Airflow Visualization Armin R. Mikler Movement and Desire D S/D A B C D … C A B A A A C … B B B C … C C C C … D … B … C … D … - … …… Agent at (xi, yi) Desire Functions 7 6 Thirst Threshold 5 4 f(x) Smoking Smoking Threshold Thirst 3 Example of functions that model different types of desire as a function of time. 2 1 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 Time (t) Armin R. Mikler Particle Suspension and Dispersion Settling of bacilli Time As a function of time, bacilli settle toward the ground and may spread to neighboring cells Armin R. Mikler State of each cell Ci,j depends on Ci,j+1, Ci,j-1, Ci+1,j, Ci-1,j, Ci+1,j-1, Ci+1,j+1, Ci-1,j-1, Ci-1,j+1 The color of a cell changes based on the majority color of its neighbors T0 T1 Armin R. Mikler Visualization--‘Simulated’ Simulation Pathogen Content Obstructability Healthy Person Normal Weaker Person Low/Med TB Sick Person High TB Removed Floor Obstacle Wall Armin R. Mikler Armin R. Mikler Armin R. Mikler Armin R. Mikler Armin R. Mikler The Future: Clusters and the GRID Faster hardware and new high-bandwidth networks demand that we explore new cluster architectures. Larger, more complex cluster environments make it imperative to invest in new efficient and scalable tools. Grand Challenge problems will continue to drive the development of computing infrastructure. Distributed HPC will become common place. (DOE SciDAC) Management Tools designed for single hosts or small clusters are likely NOT to scale. New types of Middleware is needed to decouple the underlying distributed infrastructure from the applications. Armin R. Mikler Grid Layers…virtualization Data Grid Comp. Grid Bio Grid i.e., Scientific Discovery through Advance Computing Applications Application-Specific Grid Services (APIs) Middleware General Grid Services Grid Engine Grid Engine Grid Engine Grid Engine Grid Engine Grid Engine Grid Access Internet / Private Networks Armin R. Mikler Matter of Facts…. There is increasing demand for harnessing computational resources Increasing demand for Grid-based computing at the private sector Computing Power will become a commodity like Water, Gas, etc. As with ISPs, Grid Access Providers (GAPs) will have to guarantee Quality of Service. Through Grid Services, we can provide a global computing infrastructure and facilitate services for a large number of application domains at the private and public sector! Examples: Healthcare, Education, Industrial R&D, Entertainment, Sciences, etc. Armin R. Mikler Cluster Semantics Cluster Nodes MASTER NODE Networking Interconne ct Armin R. Mikler Armin R. Mikler Armin R. Mikler People Behind - The Group Armin R. Mikler A Final Push to Control TB Because the number of cases of TB in the U.S. are lower than they’ve ever been, we have the opportunity to finally control TB in the U.S. Recent research suggests that focusing on the dynamics of how TB is transmitted in specific locations is a much-needed final push to TB control. Homeless shelters and overcrowded areas constitute reservoirs of TB infection. Yet little research exists on the dynamics of localized TB transmission in homeless shelters. Little attention has been given to places like factories, warehouses, healthcare facilities, or schools where people work in close proximity for long periods of time. Armin R. Mikler Cray Y-MP & IBM Power4 “Common” supercomputer in early 1990's ~$1 million from Cray Max speed: 2.3 gigaflops (record speed) • Pentium III 1Ghz processors. Same processors sold “off the shelf” • 64 gigaflops • 198th on Top500 list (http://www.top500.org) Armin R. Mikler Big Mac @ Virginia Tech Macintosh G5 workstations Infiniband networking interconnect 3rd fastest supercomputer in the world Armin R. Mikler Cellular Automata (4 Neighbors – von Newman) State of each cell Ci,j depends on the neighbors Ci,j+1, Ci,j-1, Ci+1,j, Ci-1,j For example, the color of a cell depends on the majority color of its neighbors T0 T1 Armin R. Mikler