Enhancing User-Productivity and Capability Through Integration of Distinct Software in Epidemiological Systems Suruchi Deodhar ∗ suruchi@vbi.vt.edu Keith Bisset kbisset@vbi.vt.edu Jiangzhuo Chen chenj@vbi.vt.edu Madhav V. Marathe∗ mmarathe@vbi.vt.edu Yifei Ma∗ yifeima@vbi.vt.edu Network Dynamics and Simulation Science Laboratory Virginia Bioinformatics Institute Virginia Tech, Blacksburg, VA 24061, USA ABSTRACT Categories and Subject Descriptors Public health policy decision makers need analytical and interactive features in epidemic simulation systems, along with the ability to simulate disease propagation over large scale populations, ranging over millions of individuals. To fulfill these requirements, we decided to re-engineer existing epidemiological software systems and integrate them together such that the performance of the overall system was minimally affected. The systems that were part of the integration effort included EpiFast, an HPC-based simulation engine, that simulates disease diffusion over multiple regions; ISIS, a web-based visual interface tool, used for analyzing the role of different parameters in disease propagation; and a database management system, storing and operating on the demographic and geographic information about different city populations. We analyzed the feasibility of existing middleware platforms to support the integration and developed a new architecture that achieves seamless and efficient integration of component systems. The integrated software system provides a combination of capability along with usability and flexibility, required by public health policy decision makers to study epidemics holistically. It also allows reuse of complex intervention strategies defined by multiple users through the web-based interface and reduces the overall time to set-up experiments and manage data. In this paper, we describe the flexible architecture that made the integration of these distinct software components possible and report on the case studies that show considerable improvement in productivity of decision makers and epidemiologists using the new integrated tool. I.6 [SIMULATION AND MODELING]: Applications; D.2 [SOFTWARE ENGINEERING]: Software Architectures, Data abstraction General Terms Design, Experimentation, Performance Keywords Epidemic simulation modeling, Interactive computations 1. INTRODUCTION The study of epidemics is a systematic process that involves analysis of various factors and parameters in the propagation of contagious diseases. Public health policy makers need to analyze the dynamics of disease propagation and come up with appropriate intervention strategies to contain the spread of diseases. An HPC-based platform like EpiFast [8] simulates the disease diffusion process in large cities. It uses contact information about a given population and a discrete event model for simulating an epidemic. Most of the interventions applied using EpiFast are static in nature, where the sub-populations to be intervened are pre-computed. Hence new interventions cannot be submitted based on the dynamics of an epidemic. This implies that only a fixed set of intervention strategies can be applied using EpiFast. Also, EpiFast lacks a user-friendly interface to enhance its usability for non-technical users, in particular the public health policy makers. To study the effect of complex interventions on disease propagation, sub-populations to be intervened should be computed dynamically. However, computing sub-populations at run-time is a non-trivial task and is difficult to achieve within a simulation engine like EpiFast. In particular, it involves joining and grouping of data and finding the subpopulations with different parameters. This requires significant changes to be made to the simulation code. To resolve this issue, Indemics [7] was developed as a data intensive, high performance modeling platform for applying interventions to epidemiological simulation systems. The distinguishing feature of Indemics is that it uses an external database ∗ Also affiliated with Department of Computer Science, Virginia Tech. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. IHI’12, January 28–30, 2012, Miami, Florida, USA. Copyright 2012 ACM 978-1-4503-0781-9/12/01 ...$10.00. 171 Component system ISIS EpiFast Indemics Simfrastructure Description Web-based visual interface tool High performance simulation engine DBMS Assisted HPC framework Middleware for high performance distributed systems Technology Google Web Toolkit C++/MPI Java based middleware + Oracle database Jini-based service oriented system Table 1: Components of DISimS and corresponding technologies management system (DBMS) for computing interventions dynamically and thus provides the required flexibility for computation. However, the simulation output generated by Indemics needs to be interpreted manually to understand the statistical significance of the results. Indemics does not provide automated generation of analysis plots. For practical use, Indemics framework must be combined with an intuitive user interface and data analysis system that can generate analysis plots for non-technical users. ISIS (Interface to Synthetic Information Systems) is one such existing web-based visual interface tool developed at the Network Dynamics and Simulation Science Laboratory, for analyzing the role of different parameters in disease propagation. It is typically used as the front-end system to select diffusion and static intervention parameters to simulation engines such as EpiSimdemics [5] and EpiFast [8]. ISIS provides an intuitive interface and a framework to set up and run experiments systematically. It also provides statistical analysis capabilities and tools for generating relevant plots based on the results of simulation. It is evident that combining these three existing systems, ISIS, EpiFast and DBMS, would provide the necessary functionality to apply dynamic run-time interventions to the diffusion engine, in addition to enhancing the usability of the system for non-technical users. This enhanced integrated system can leverage the complex high performance computing framework of EpiFast, the flexibility provided by databases and the middleware infrastructure and analysis capability provided by the ISIS system in a completely transparent way. We call this integrated epidemiological system - DISimS(Distributed Interactive Simulation System). In this paper, we describe the infrastructure, architecture and technical implementation that made the integration of ISIS, EpiFast and DBMS possible. The main contributions of this paper are: network-centric simulation system composed of several diverse systems, that can be applied to a variety of domains. This architecture is then used as the basis for developing the architecture of the integrated epidemiological system - DISimS. A typical usage scenario where the new system provides a combination of usability and flexibility is also explained in this section. Section 5 provides some case studies that show improvement in the productivity of users using DISimS. In Section 6, we discuss the applicability of the architecture in other domains along with issues and limitations, followed by the concluding remarks in Section 7. 2. BACKGROUND Computational epidemiology involves development and use of computer models for simulating disease propagation within a region. Epidemiologists use computational models in collaboration with computer scientists to study the spatio-temporal diffusion of an epidemic. The diffusion process is based on a number of disease model and other parameters such as transmissibility, symptomatic duration, initial conditions and so on. High performance computing systems such as Episimdemics [5] and EpiFast [8] are used to simulate the diffusion process. Traditionally, computational scientists received a set of scenarios to be simulated from public health agencies such as DTRA and the CDC to study disease diffusion [2, 3, 4]. The simulation code was then modified by the programmers to accommodate new study requirements and the results of the experiment were reported back to the health agencies. This process could take weeks to months, depending on the complexity of the scenarios. With advances in computing, the level of collaboration and support between the epidemiologists and computational scientists has evolved over the years. To allow epidemiologists to select parameters easily and without the need to understand the technical aspects of computations, ISIS was developed as a web-based system at NDSSL. ISIS provides a user-friendly graphical interface to modify the diffusion parameters and start the simulation process. In addition to the diffusion process, epidemiologists are interested in studying the effects of different intervention strategies on the diffusion process. Interventions can be pharmaceutical or non-pharmaceutical. Examples of pharmaceutical interventions include vaccines and anti-viral drugs such as TamiFlu, whereas non-pharmaceutical interventions include social distancing and school closure. To study the effects of these interventions on the spread of a disease using a diffusion system like EpiFast, the interventions need to be implemented programmatically within the system. Whenever a new intervention is to be studied, changes have to be made to the EpiFast code. To avoid constant changes to the diffusion code and support separation of concerns, Indemics [7] was developed as a data intensive, high perfor- • A novel architecture to support efficient implementation of diverse technical components and to achieve interoperability between distinct middleware systems • Design and development of DISimS (Distributed Interactive Simulation System) with improved functionality, capability and usability to support complex database-assisted interventions • A management framework that supports automated set-up and execution of complex intervention experimental designs with in-built statistical analysis tools for studying epidemics The rest of the paper is organized as follows. In Section 2, we provide a brief background on the motivating factors that encouraged us to undertake this project. In Section 3, we discuss the state of the art and related work in this field. In Section 4, we provide a high level architecture of a complex 172 mance modeling platform that uses database management systems for applying interventions to diffusion systems like EpiFast externally. Using databases for computing and applying interventions, as proposed by Indemics, provides a number of benefits both in terms of development effort and efficiency. Indemics [7] uses a SQL based language called Indemics Query Language (IQL) to specify interventions through the Indemics client. An example of an Indemics client script written in IQL can be found in Figure 3 in the paper [7]. Using IQL, epidemiologists can write the intervention scripts themselves with little effort, or request the programmers to design the scripts for complex experiments. This avoids making changes to the simulation code and the subsequent time and effort in development. Also, for some cases, using databases reduces the running time of the experimental study execution. Overall, using Indemics to apply interventions considerably reduces development time to implement interventions and also allows computing very complex interventions based on multiple parameters, which is very difficult within the simulation engine. In spite of the many advantages of the Indemics platform, the use of this tool was not as widespread in our research group as expected. We investigated the reasons for this and found that lack of a user-friendly graphical interface was one of the major drawbacks of Indemics, causing a low adoption rate among epidemiologists. Users of Indemics scripts had to spend considerable amount of time in writing the intervention scripts for setting up a particular experimental study simulation. Moreover, epidemiologists had to perform statistical analysis on the results manually. Indemics also did not provide an efficient way for multiple users to share their scripts written in IQL. Hence achieving collaboration between the epidemiologists was difficult, which is a major requirement from a user-standpoint. To overcome the above problems, we analyzed the deficiencies in the existing systems and studied the requirements of a new system. Instead of developing a completely new system from scratch, we decided to re-engineer the existing ISIS web-based system and its middleware infrastructure to provide access to the Indemics platform and EpiFast diffusion engine. The new re-engineered system, DISimS, fulfills all the computing requirements for simulating disease diffusion process, along with providing a database driven platform for applying interventions and an easy-to-use, intuitive interface for epidemiologists. The system also supports multiple simultaneous users and provides sharing capabilities to share intervention scripts, experiment results and analysis results. 3. RELATED WORK The field of modeling and simulations has evolved over the years. In particular, the area of integrated simulations for gaming, navigation, incident management, transportation systems and so on that require the capability to support multiple user interactions based on a number of dynamic parameters, has picked up wide interest in the community. Improving the scalability and performance of simulation systems still remains an open computing problem in this area. But in addition, much research is being carried out on how to integrate multiple disparate systems efficiently. Development of architecture for distributed and parallel systems derives heavily from the seminal work of Carrier and Gelerenter [10], who describe the different programming paradigms for each of the conceptual classes that are categorized based on the type and extent of communication. More recently, Jain and McLean [17] discuss the architecture of simulation systems integrated with massively parallel online games for incident management training. “Splash” [1] is an ongoing project, that aims to create an integrated approach of multiple simulation systems such as transportation, media and advertising, food and nutrition, market dynamics and so on to study the impact of various factors on the health of individuals, in particular for causing obesity. Research has also been carried out in the area of reflective middleware systems for integration of multiple simulation environments. Jalali et al [18] describe the RAISE project that involves integration of diverse existing simulation and data models using communication and evacuation simulation systems. In the epidemiology world, the public health community has been exploring the area of epidemic modeling and simulations and its system architecture. The early work on such simulation systems primarily focused on accurately modeling the complex epidemic diffusion process and efficiently implementing it in current powerful computer systems. FluTE [12] uses a stochastic agent-based model to simulate the disease spread across large populations. The simulation model has been calibrated using historic pandemic data and it can be used to study the dynamics of similar influenza outbreak and evaluate the pandemic preparedness plans. EpiSims [19] presents regional population by an agent-based system, constructs social contact network from human daily social activities and simulates the spread of disease over the social interactions. BioWar [9] is a more sophisticated agent-based model to present disease diffusion that encompasses certain exogenous factors such as media information, geographic information and weather information with social contact network and diffusion information. Instead of examining accurate and realistic dynamics of epidemic diffusion system, the paper [16] addresses the issue of building a simulation framework and enabling the epidemic simulations in real time to aid the preparatory measures and planning for future epidemics. For studying epidemics in real time using simulation systems, it is essential to reduce experimental design and analysis effort, in addition to the execution time. In this paper, we introduce our work on building an integrated framework incorporating experiment preparation module, information management module and offline analysis module for studying complex epidemic dynamics. This work on design of experiment preparation refers to the early work on epidemic modeling and the studies on mitigation strategies. Cauchemez et al [11] introduce and analyze the role of social networks in historic epidemic diffusion. Chao et al [6, 22, 23] study the effectiveness of mitigation strategies to control disease spread. The correlation between disease spread and human movement was discussed in paper [20]. The hierarchical spatial disease spread has been evidenced by historic epidemic outbreaks. The analysis of such historic pandemic information and simulated epidemic diffusion results have extended our understanding of infectious disease propagation. 4. ARCHITECTURE AND IMPLEMENTATION In this section, we propose a high level architecture of a 173 174 Middleware platform Description Usage Typical Comm. Data Size Simfrastructure Jini-based middleware for high performance distributed systems between ISIS and EpiFast Message-oriented data transfer 140 B per simulation Indemics server Java based middleware between Oracle database and EpiFast optimized for high performance parallel system Raw data transfer 16 KB per iteration Table 2: Middleware platforms and their speed comparison. Note: A single simulation experiment is carried out over multiple iterations to study epidemic spread patterns it over to the message deliverer. The message deliverer then directly connects to the database and retrieves large data from the database. The Simulation session pool module of the HPC middleware maintains a queue of data retrieved from the database, to be passed to the HPC-based simulation system. The ‘Middleware for HPC engine’ can support data transfer rates of several MBps over many continuous iterations. As shown in Figure 1, the presence of two middleware systems connected together through brokers, instead of a single generic middleware provides efficient execution of the end-to-end simulation system. The architectural style described in this section can be applied to other complex HPCbased simulation systems. In the next sections, we describe how this architecture was applied to the development of the epidemiological system DISimS (Distributed Interactive Simulation System) by integrating three distinct component systems. 4.2 based on the results of the simulation run, which can be subsequently studied for further analysis. The middleware infrastructure that supports the communication between ISIS and EpiFast simulation engine is called “Simfrastructure”. Simfrastructure was conceptualized as a high performance distributed system mechanism that could handle communication between varying distributed systems at a higher level of abstraction. It was not intended to support massive data movement but as a signaling mechanism between distributed systems. Hence, Simfrastructure was developed using Jini, a service oriented architectural model of communication. Jini-based architecture [21] provides a service oriented abstraction for registering and removing services easily on the fly. Each of the component systems of Jini can be considered as a composable service. The concept of “blackboard”, based on JavaSpaces forms the key component of Jini middleware. Data can be written on to the blackboard with a specific leasing time specified by the producer of data and any registered consumer service can access the data from the blackboard. The producer and consumer systems need not be co-located on the same server and may be independent standalone systems or clusters. Jini-based service oriented architecture is extended in the Simfrastructure middleware for communication between ISIS and EpiFast. The third important component of the integration process is the relational DBMS that stores demographic and contact information about individuals in cities. As explained before, epidemiologists need to intervene the epidemic diffusion engine such as EpiFast with dynamic run-time parameters. It is more suitable to store the information about the social contact network and population demographics in a database and pass it externally to the simulation engine, rather than storing it within the engine. The infection data derived from the diffusion process can be passed back to the database and stored there for retrieving intervention data for the next time period. This concept of using a DBMS to apply interventions to a simulation engine externally, was first introduced in Indemics [7]. From the architectural description in Section 4.1, it was clear that a service oriented middleware was not the ideal way to connect the database to the EpiFast simulation engine, since that would affect the performance. Hence Simfrastructure middleware could not be used for this purpose. Instead of developing a new middleware for EpiFast engine, we decided to extend the components of Indemics and its supporting infrastructure to connect to EpiFast. Indemics server was used as the middleware platform, since it provided appropriate level of abstraction to ensure large scale data movement between EpiFast and database. Table 2 provides a comparison of the two middleware platforms - Simfrastructure and Indemics Server, their main ob- Technical diversity of component systems of DISimS In this section, we describe the technical features of each of the component systems that are part of DISimS. The most important component of an epidemic simulation process is the simulation engine which simulates the diffusion process. Since the diffusion process is to be simulated over a large scale population and the speed of simulation is a key concern, a high performance computation engine is a necessity. For the purpose of the integration effort to develop DISimS, we used EpiFast [8], a simulation engine which can simulate disease diffusion in large-scale populations consisting of millions of individuals. EpiFast is developed as an HPC based system using C++/MPI and simulates the dynamic interactions between individual behaviors and the spread of epidemics. It also has the capability to execute multiple replicates for better accuracy. EpiFast engine is known for executing the diffusion process at a very rapid pace. For instance, to simulate disease diffusion in a city like Miami, which has a population of about 2 million people, EpiFast takes less than 30 seconds. The second component of the integration is called ISIS. ISIS was initially developed as a web-based front end for supporting multiple simulation engines such as EpiFast and Episimdemics using GWT (Google Web Toolkit), which provides a Java based platform for developing web-systems. ISIS can be accessed by multiple users simultaneously from any machine with Internet connectivity using any standard web browser. It has a simple graphical user interface and provides the ability to specify different parameters for a simulation run. Moreover, it can also generate different plots 175 176 Case study School based Block based System name EpiFast Indemics DISimS EpiFast Indemics DISimS Experiment development time hard to implement 1 day 5 minutes hard to implement 0.5 day 5 minutes Time to set-up Experiment analysis unknown 0.5 day 20-25 mins unknown 0.5 day 20-25 mins Total human effort for experiment and analysis unknown 1.5 days 30 min unknown 1 day 30 min Experiment execution time unknown 3-4 min/iteration 3-4 min+ 3-4 min/iteration unknown 4-6 min/iteration 3-4 min+ 4-6 min/iteration Table 3: Comparison of total effort for complex intervention case study on Miami city. Experiment execution is carried out over multiple iteration days. For DISimS, the expt. execution time includes communication time between the web-server and Indemics server in addition to actual expt. execution tional middleware, very high data speeds can be supported per iteration across multiple simulation runs and thus the required performance of the DISimS simulation system is maintained. 4.4 to the front end through the interface broker, where the corresponding graphs are plotted for the user. 4.5 Usage scenario Consider a scenario where there is a possibility of a large scale epidemic outbreak of an infectious disease in a given city. Two to three persons have already been diagnosed with the disease. It was further noticed that the diagnosed persons are of school-going age. An epidemiologist is interested in evaluating the effectiveness of different strategies that can contain the spread of epidemic within the city. Some of the strategies include school closure, anti-viral drugs and vaccination. For a school based intervention, if the number of school children diagnosed with a disease exceeds a certain threshold then apply intervention to the entire school. This usage scenario could not be simulated completely within the simulation engine before the development of Indemics. Even using Indemics, the epidemiologist would have had to manually write a script using Indemics Query language to implement the particular intervention strategy such as school closure. There was no easy way for the epidemiologist to locate if any existing users had previously written or executed the same script, which could be reused. There was no such searching or experiment management mechanism available. Moreover, the user would have to manually analyze the results of simulation and come up with conclusions. With the introduction of DISimS, the epidemiologist can now use the web-based front-end of ISIS, as shown in Figure 3, from any machine with web access and select the type of intervention from a list of existing interventions. There is an easy searchable drop-down menu list for selection of intervention strategies. Once the results of the simulation based on various parameters are returned, the ISIS interface allows for statistically analyzing the results from different perspectives. Figure 4 shows one such Analysis output for a dynamic intervention experiment simulated for 200 days to study the spread of catastrophic flu (type of flu in which more than 40% of the people in a region are infected with the flu, without applying any interventions) in a selected region. The epidemiologist can carry out multiple runs of the same strategy or multiple strategies on the same region. The list of interventions enumerates all possible intervention strategies that have already been written or executed. If a particular intervention strategy to be applied is not available in the list, then the epidemiologist can add such a template herself or request one to be added by the development team. Data and context flow The data and context flow within DISimS is shown in Figure 2. DISimS is accessible from any machine connected to the Internet through a web based interface. Users, in particular epidemiologists and public health policy decision makers select, different parameters of execution to run a case study experiment such as the region on which simulation has to be carried out, number of days of simulation, replicates, disease model and so on. In addition, the users can select intervention scripts to be applied to the diffusion process, such as Block-based intervention, school based intervention, Distance-1 neighbor vaccination and so on, to study effects on disease propagation dynamically. This information is passed as input to the interface broker of the Simfrastructure middleware as a message. The interface broker interprets the data and puts a request for simulation to be started on the blackboard. The execution broker monitors the blackboard for new simulation requests and signals the simulation engine- EpiFast to start execution, when data is available. The Indemics broker is one of the most important components of DISimS. The Indemics broker maintains several distinct client scripts for different intervention studies. Based on the intervention parameters submitted on the blackboard that are passed from the front-end, the Indemics broker selects the corresponding client script template and replaces the dummy parameters with the actual experiment and intervention parameters of execution. The generated client script is in IQL that can be interpreted by the Indemics server. The Indemics server is configured as a background process that is always running. Based on the invoked intervention (client) script, the Indemics server connects to appropriate database tables, retrieves intervened population data and passes it as an intervention to the simulation engine. Similar to the execution broker that signals EpiFast to start execution, the Analysis broker starts the analysis script based on the analysis request made by the user. The request for analysis of a particular experiment is made to an Analysis server that runs the R statistical tool. The results of analysis are written on to the blackboard and consequently passed 177 The effort for writing such a template is one-time and takes very small amount of time compared to the benefits derived. For all future case-study runs by any other user, this strategy will be immediately available. All in all, the epidemiologist can leverage all the benefits of Indemics, ISIS and Epifast in a user-friendly and efficient manner with DISimS. 5. consuming and it was difficult to report simulation results and conclusions in time. CASE STUDIES In this section, we share our experiences of using DISimS to execute case studies in relation to the earlier simulation systems like EpiFast and Indemics. After the 2009 H1N1 flu outbreak, we decided to implement a case study to compare the effectiveness of government intervention strategies and individual protection strategies. In contrast to previous studies we had run to examine the course of epidemic dynamics, the main objective of this study was to study and provide effective comparison between government interventions (top-down action) and individual protections (bottomup action), in mitigating the propagation of H1N1 virus. The government strategies included block based intervention and school based intervention. The Block intervention strategy specifies that when a fraction of people diagnosed with a disease in a census block exceeds a certain threshold, then the entire block will be quarantined or given medical treatment. The targeted individual protection strategies are more complicated and require detailed geographic and demographic information. Our simulation engines such as EpiSimdemics and EpiFast did not have features to integrate such supplemental information at that time and hence could not implement targeted individual intervention. The code of these high performance simulation engines would have had to be modified to support such interventions. Our simulation engine designers and the intervention experiment strategy designers would have had to work together to precisely interpret the strategies and translate them into the code for the high performance engine. The entire development process including the requirements gathering, implementation and testing was estimated to be several weeks, in contrast to the estimated experiment execution time of only one week. This approach was time- Figure 4: ISIS-based analysis output of DISimS To overcome this problem, we applied Indemics, a database supported epidemic simulation framework and used it to run the intervention experiment. In contrast to the epidemic simulation engine like EpiFast, the implementation of interventions in Indemics is modeled by data query algebra, and the interventions are completely computed using query language of database management systems. Experiment strategy designers only need to describe their scenarios in IQL and submit the simulation jobs to Indemics for execution. The experiment development process of Indemics takes a few days to map the interventions into IQL. Indemics incurs marginal execution time overhead, but it needs no significant code development or testing. We adopted this solution to run the intervention studies and it greatly reduced the study period and saved a significant human effort. Although the development time for implementing intervention experiments was shortened remarkably by Indemics, Indemics did not have a module to automatically set up experiments, monitor the state of an experiment and manage experimental inputs and results. There was no provision for reusability and sharing by checking if an appropriate Indemics intervention script was previously written by some other user. Also, when the interventions had to be simulated with different settings and parameters and repeated tens of times to reduce random factors, using Indemics became cumbersome. For example, the scripts to run factorial experiments by changing multiple parameter values had to be prepared manually, which was very error-prone. The simulation inputs and outputs had to be well organized to avoid overwriting or misreading. The simulation jobs also had to be monitored by the experiment executors. Such tasks needed considerable manual effort. Reading and understanding raw simulation results was difficult since Indemics did not have statistical analysis or plotting modules. From the experience of implementing the case studies for H1N1, we realized that the usability of the simulation systems had to be leveraged further. Hence we developed “DISimS”, with features like user interface, experiment data management, job monitoring and analysis in addition to the at- Figure 3: ISIS-based front-end system of DISimS with dynamic intervention selection 178 tributes that were already provided by Indemics and EpiFast. Employing DISimS for the experiments of schoolbased and block-based interventions reduces the overall experiment set-up and management time and enhances the productivity of the users considerably. The users only need to select the intervention scripts and parameters of execution using a simple graphical web-based interface. The data files for the factorial experiment design are well organized and well-archived and the simulation jobs are automatically monitored and scheduled. DISimS introduces a marginal overhead of execution as compared to Indemics, which is equivalent to the communication time between the web based front end to the Indemics server middleware. Table 3 shows the comparison of the total effort for the intervention experiments using EpiFast, Indemics and DISimS on the city of Miami. As can be seen in the table, the total human effort for experiment design and analysis is reduced significantly by DISimS compared to previous systems, and the total increase in the experiment execution time is only a few additional seconds, which is negligible. This table shows the value of DISimS for improving the productivity of epidemiologists and public health policy decision makers. They can now set-up, manage and execute complex intervention case studies without much help from the high performance computing developers. Figure 5 shows an epicurve derived as an output from the DISimS system. It shows the comparison of the two strategies- School based intervention and Block based intervention on the propagation of an epidemic, over a 300 day period. The epi-curves shows the quantity of infected cases and diagnosed cases on each simulation day, from the 1st day to the 300th day. From the epi-curves, the effectiveness of each mitigation strategy can be easily observed. In this plot, School intervention strategy performs better than Block intervention strategy. The Block intervention is more effective to contain an epidemic than the base case without any interventions. 6. 6000 Size of cases 5000 4000 3000 2000 1000 0 0 50 100 150 Days 200 250 Diagnosed cases without interventions Diagnosed cases with School intervention Diagnosed cases with Block intervention Infected cases without interventions Infected cases with School intervention Infected cases with Block intervention Figure 5: Epicurves for School and Block intervention case studies on Miami lution user-friendly graphical interface to keep the attention of the users. The architecture described in this paper can inform the design and development of gaming and such other systems. The design and implementation of DISimS were based on the features of existing component systems and technologies. For instance, the main reason that the Jini-based simfrastructure and Indemics server could be extended to EpiFast was due to the master-slave nature of EpiFast algorithm. The master node that executes EpiFast algorithm on a cluster can aggregate the intermediate results from slave nodes and pass information to the Indemics server and Simfrastructure as needed. It can also decompose the information to be passed from database to specific slave nodes. Hence, in the absence of the master-slave nature of EpiFast algorithm, the co-ordination between component systems could have been an issue. In case of symmetric parallel computation without a master, a new mechanism will have to be devised to ensure synchronization. Nonetheless, the approach of technological assessment and remodeling described in this paper, to analyze the feasibility of existing technologies for building the architecture of a complex system, helped us to make an informed decision about the development of DISimS. DISCUSSION In this paper, we discussed the architecture of a complex simulation system made up of components such as an HPCbased simulation system, database management system and a front-end user interface. More and more applications today are using high performance engines for computations. Databases and interfaces are part of most of the application systems. Hence the architecture described here can be easily extended to multiple domains that involve integration of distinct software systems. We demonstrated the applicability of this architectural style to implement an end-to-end epidemic simulation system called “DISimS”. Simulation systems for transportation, biological systems, disaster planning systems, financial modeling systems and so on, need features similar to that of an epidemic simulation system, with core diffusion functionality combined with interactive and analysis features. Hence the architectural considerations described in this paper can be used as the basis of designing such large scale systems. One of the most interesting applicability of the architecture described in this paper is to develop large scale gaming systems. Gaming systems not only need fast computations at run-time along with changing parameters, but also very detailed graphical interfaces for the convenience of users. Performance is a key requirement for such gaming systems along with a high reso- 7. CONCLUSION In this paper, we have given an architectural overview on how to design complex simulation systems, that can enhance the usability, flexibility and capability of software systems. We have shown a specific use case of the architecture by applying it to DISimS, where three existing component systems ISIS, EpiFast and DBMS were integrated in an efficient manner. The DISimS system with its web interface and analytical features has proved to enhance the productivity of users considerably. It is currently being used internally at Network Dynamics and Simulation Science Laboratory, and we plan to make it available to a large population of epidemiologists and public health policy decision makers. We 179 300 plan to add more interesting and useful features to DISimS in the near future. 8. ACKNOWLEDGEMENTS 9. REFERENCES [12] We thank our external collaborators and members of the Network Dynamics and Simulation Science Laboratory (NDSSL) for their suggestions and comments. This work has been partially supported by NSF Nets Grant CNS- 0626964, NSF [13] HSD Grant SES-0729441, NIH MIDAS project 2U01GM0706947, NSF PetaApps Grant OCI-0904844, DTRA RD Grant HDTRA1-0901-0017, DTRA CNIMS Grant HDTRA1-07-C0113, NSF NETS CNS-0831633, DHS 4112-31805, DOE DE[14] SC0003957, NSF REU Supplement CNS-0845700, US Naval Surface Warfare Center N00178-09-D-3017 DEL ORDER 13, NSF Netse CNS-1011769 and NSF SDCI OCI-1032677. [15] [16] [1] SPLASH: IBM Project. http: //www.almaden.ibm.com/asr/projects/splash/. [2] K. Atkins, C. L. Barrett, R. J. Beckman, et al. DTRA National Guard study capability demonstration. Technical Report 06-060, NDSSL, 2006. [3] K. Atkins, C. L. Barrett, R. J. Beckman, et al. Simulated pandemic influenza outbreaks in Chicago: NIH DHHS study final report. Technical Report 06-023, NDSSL, 2006. [4] K. Atkins, C. L. Barrett, R. J. Beckman, et al. An analysis of public health interventions at military bases during a pandemic influenza event. Technical Report 07-019, NDSSL, 2007. [5] C. L. Barrett, K. R. Bisset, S. G. Eubank, X. Feng, and M. V. Marathe. Episimdemics: an efficient algorithm for simulating the spread of infectious disease over large realistic social networks. In Proc. ACM/IEEE conference on Supercomputing, pages 290–294, 2008. [6] N. E. Basta, D. L. Chao, M. E. Halloran, L. Matrajt, and I. M. L. Jr. Strategies for pandemic and seasonal influenza vaccination of schoolchildren in the united states. American journal of epidemilogy, 170, 2011. [7] K. Bisset, J. Chen, X. Feng, Y. Ma, and M. Marathe. Indemics: an interactive data intensive framework for high performance epidemic simulation. In Proceedings of the 24th International Conference on Supercomputing (ICS), pages 233–242, 2010. [8] K. R. Bisset, J. Chen, X. Feng, V. S. A. Kumar, and M. V. Marathe. EpiFast: a fast algorithm for large scale realistic epidemic simulations on distributed memory systems. In Proc. the 23rd International Conference on Supercomputing, pages 430–439, 2009. [9] K. M. Carley, D. B. Fridsma, E. Casman, A. Yahja, N. Altman, L.-C. Chen, B. Kaminsky, and D. Nave. Biowar: Scalable agent-based model of bioattacks. IEEE Transactions on Systems, Man, and Cybernetics, Part A, 36(2):252–265, 2006. [10] N. Carriero and D. Gelernter. How to write parallel programs: A guide to the perplexed. ACM Comput. Surv., 21(3):323–357, 1989. [11] S. Cauchemez, A. Bhattarai, T. L. Marchbanks, R. P. Fagan, S. Ostroff, N. M. Ferguson, D. Swerdlow, and the Pennsylvania H1N1 working group. Role of social [17] [18] [19] [20] [21] [22] [23] 180 networks in shaping disease transmission during a community outbreak of 2009 h1n1 pandemic influenza. Proceedings of the National Academy of Sciences, 2011. D. L. Chao, M. E. Halloran, V. Obenchain, and I. M. Longini Jr. FluTE, a publicly available stochastic influenza epidemic simulation model. PLoS Computational Biology, 6(1), 2010. F. Curbera, M. Duftler, R. Khalaf, W. Nagy, N. Mukhi, and S. Weerawarana. Unraveling the web services web: an introduction to soap, wsdl, and uddi. Internet Computing, IEEE, 6(2):86 –93, mar/apr 2002. T. Erl. Service-Oriented Architecture: Concepts, Technology, and Design. Prentice Hall PTR, Upper Saddle River, NJ, USA, 2005. R. T. Fielding. Architectural styles and the design of network-based software architectures. PhD thesis, University of California, Irvine, 2000. AAI9980887. H. V. Fineberg and M. E. Wilson. Epidemic science in real time. Science, 324:987, 2009. S. Jain and C. R. McLean. Integrated simulation and gaming architecture for incident management training. In Winter Simulation Conference, pages 904–913, 2005. L. Jalali, N. Venkatasubramanian, and S. Mehrotra. A reflective middleware architecture for simulation integration. In Proceedings of the 8th International Workshop on Adaptive and Reflective MIddleware, ARM ’09, pages 3:1–3:6, New York, NY, USA, 2009. ACM. S. M. Mniszewski, S. Y. Del Valle, P. D. Stroud, J. M. Riese, and S. J. Sydoriak. Episims simulation of a multi-component strategy for pandemic influenza. In Proceedings of the 2008 Spring simulation multiconference, SpringSim ’08, pages 556–563, San Diego, CA, USA, 2008. Society for Computer Simulation International. C. Viboud, O. N. Bjornstad, D. L. Smith, L. Simonsen, M. A. Miller, and B. T. Grenfell. Synchrony, waves, and spatial hierarchies in the spread of influenza. Science, 312(5772):447–451, 2006. J. Waldo, T. J. Arch, and T. Miura. The jini architecture for network-centric computing, 1999. J. T. Wu, S. Riley, C. Fraser, and G. M. Leung. Reducing the impact of the next influenza pandemic using household-based public health interventions. PLoS medicine, 3(9):361, 2006. Y. Yang, J. D. Sugimoto, M. E. Halloran, N. E. Basta, D. L. Chao, L. Matrajt, G. Potter, E. Kenah, and I. M. Longini. The transmissibility and control of pandemic influenza a (h1n1) virus. Science, 326(5953):729–733, 2009.