Enhancing user-productivity and capability through integration of

advertisement
Enhancing User-Productivity and Capability Through
Integration of Distinct Software in Epidemiological
Systems
Suruchi Deodhar ∗
suruchi@vbi.vt.edu
Keith Bisset
kbisset@vbi.vt.edu
Jiangzhuo Chen
chenj@vbi.vt.edu
Madhav V. Marathe∗
mmarathe@vbi.vt.edu
Yifei Ma∗
yifeima@vbi.vt.edu
Network Dynamics and Simulation Science Laboratory
Virginia Bioinformatics Institute
Virginia Tech, Blacksburg, VA 24061, USA
ABSTRACT
Categories and Subject Descriptors
Public health policy decision makers need analytical and interactive features in epidemic simulation systems, along with
the ability to simulate disease propagation over large scale
populations, ranging over millions of individuals. To fulfill
these requirements, we decided to re-engineer existing epidemiological software systems and integrate them together
such that the performance of the overall system was minimally affected. The systems that were part of the integration effort included EpiFast, an HPC-based simulation engine, that simulates disease diffusion over multiple regions;
ISIS, a web-based visual interface tool, used for analyzing
the role of different parameters in disease propagation; and
a database management system, storing and operating on
the demographic and geographic information about different city populations. We analyzed the feasibility of existing
middleware platforms to support the integration and developed a new architecture that achieves seamless and efficient
integration of component systems. The integrated software
system provides a combination of capability along with usability and flexibility, required by public health policy decision makers to study epidemics holistically. It also allows
reuse of complex intervention strategies defined by multiple users through the web-based interface and reduces the
overall time to set-up experiments and manage data. In this
paper, we describe the flexible architecture that made the integration of these distinct software components possible and
report on the case studies that show considerable improvement in productivity of decision makers and epidemiologists
using the new integrated tool.
I.6 [SIMULATION AND MODELING]: Applications;
D.2 [SOFTWARE ENGINEERING]: Software Architectures, Data abstraction
General Terms
Design, Experimentation, Performance
Keywords
Epidemic simulation modeling, Interactive computations
1.
INTRODUCTION
The study of epidemics is a systematic process that involves analysis of various factors and parameters in the propagation of contagious diseases. Public health policy makers
need to analyze the dynamics of disease propagation and
come up with appropriate intervention strategies to contain the spread of diseases. An HPC-based platform like
EpiFast [8] simulates the disease diffusion process in large
cities. It uses contact information about a given population and a discrete event model for simulating an epidemic.
Most of the interventions applied using EpiFast are static
in nature, where the sub-populations to be intervened are
pre-computed. Hence new interventions cannot be submitted based on the dynamics of an epidemic. This implies that
only a fixed set of intervention strategies can be applied using EpiFast. Also, EpiFast lacks a user-friendly interface to
enhance its usability for non-technical users, in particular
the public health policy makers.
To study the effect of complex interventions on disease
propagation, sub-populations to be intervened should be
computed dynamically. However, computing sub-populations
at run-time is a non-trivial task and is difficult to achieve
within a simulation engine like EpiFast. In particular, it involves joining and grouping of data and finding the subpopulations with different parameters. This requires significant
changes to be made to the simulation code. To resolve this
issue, Indemics [7] was developed as a data intensive, high
performance modeling platform for applying interventions
to epidemiological simulation systems. The distinguishing
feature of Indemics is that it uses an external database
∗
Also affiliated with Department of Computer Science, Virginia Tech.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
IHI’12, January 28–30, 2012, Miami, Florida, USA.
Copyright 2012 ACM 978-1-4503-0781-9/12/01 ...$10.00.
171
Component system
ISIS
EpiFast
Indemics
Simfrastructure
Description
Web-based visual interface tool
High performance simulation engine
DBMS Assisted HPC framework
Middleware for high performance distributed systems
Technology
Google Web Toolkit
C++/MPI
Java based middleware + Oracle database
Jini-based service oriented system
Table 1: Components of DISimS and corresponding technologies
management system (DBMS) for computing interventions
dynamically and thus provides the required flexibility for
computation. However, the simulation output generated by
Indemics needs to be interpreted manually to understand
the statistical significance of the results. Indemics does not
provide automated generation of analysis plots. For practical use, Indemics framework must be combined with an
intuitive user interface and data analysis system that can
generate analysis plots for non-technical users.
ISIS (Interface to Synthetic Information Systems) is one
such existing web-based visual interface tool developed at
the Network Dynamics and Simulation Science Laboratory,
for analyzing the role of different parameters in disease propagation. It is typically used as the front-end system to select
diffusion and static intervention parameters to simulation
engines such as EpiSimdemics [5] and EpiFast [8]. ISIS provides an intuitive interface and a framework to set up and
run experiments systematically. It also provides statistical
analysis capabilities and tools for generating relevant plots
based on the results of simulation.
It is evident that combining these three existing systems,
ISIS, EpiFast and DBMS, would provide the necessary functionality to apply dynamic run-time interventions to the
diffusion engine, in addition to enhancing the usability of
the system for non-technical users. This enhanced integrated system can leverage the complex high performance
computing framework of EpiFast, the flexibility provided by
databases and the middleware infrastructure and analysis
capability provided by the ISIS system in a completely transparent way. We call this integrated epidemiological system
- DISimS(Distributed Interactive Simulation System).
In this paper, we describe the infrastructure, architecture
and technical implementation that made the integration of
ISIS, EpiFast and DBMS possible. The main contributions
of this paper are:
network-centric simulation system composed of several diverse systems, that can be applied to a variety of domains.
This architecture is then used as the basis for developing the
architecture of the integrated epidemiological system - DISimS. A typical usage scenario where the new system provides
a combination of usability and flexibility is also explained in
this section. Section 5 provides some case studies that show
improvement in the productivity of users using DISimS. In
Section 6, we discuss the applicability of the architecture in
other domains along with issues and limitations, followed by
the concluding remarks in Section 7.
2.
BACKGROUND
Computational epidemiology involves development and use
of computer models for simulating disease propagation within
a region. Epidemiologists use computational models in collaboration with computer scientists to study the spatio-temporal
diffusion of an epidemic. The diffusion process is based on a
number of disease model and other parameters such as transmissibility, symptomatic duration, initial conditions and so
on. High performance computing systems such as Episimdemics [5] and EpiFast [8] are used to simulate the diffusion
process.
Traditionally, computational scientists received a set of
scenarios to be simulated from public health agencies such
as DTRA and the CDC to study disease diffusion [2, 3, 4].
The simulation code was then modified by the programmers
to accommodate new study requirements and the results of
the experiment were reported back to the health agencies.
This process could take weeks to months, depending on the
complexity of the scenarios. With advances in computing,
the level of collaboration and support between the epidemiologists and computational scientists has evolved over the
years. To allow epidemiologists to select parameters easily
and without the need to understand the technical aspects
of computations, ISIS was developed as a web-based system
at NDSSL. ISIS provides a user-friendly graphical interface
to modify the diffusion parameters and start the simulation
process.
In addition to the diffusion process, epidemiologists are
interested in studying the effects of different intervention
strategies on the diffusion process. Interventions can be
pharmaceutical or non-pharmaceutical. Examples of pharmaceutical interventions include vaccines and anti-viral drugs
such as TamiFlu, whereas non-pharmaceutical interventions
include social distancing and school closure. To study the
effects of these interventions on the spread of a disease using
a diffusion system like EpiFast, the interventions need to be
implemented programmatically within the system. Whenever a new intervention is to be studied, changes have to
be made to the EpiFast code. To avoid constant changes
to the diffusion code and support separation of concerns,
Indemics [7] was developed as a data intensive, high perfor-
• A novel architecture to support efficient implementation of diverse technical components and to achieve
interoperability between distinct middleware systems
• Design and development of DISimS (Distributed
Interactive Simulation System) with improved functionality, capability and usability to support complex
database-assisted interventions
• A management framework that supports automated
set-up and execution of complex intervention experimental designs with in-built statistical analysis tools
for studying epidemics
The rest of the paper is organized as follows. In Section 2,
we provide a brief background on the motivating factors that
encouraged us to undertake this project. In Section 3, we
discuss the state of the art and related work in this field. In
Section 4, we provide a high level architecture of a complex
172
mance modeling platform that uses database management
systems for applying interventions to diffusion systems like
EpiFast externally. Using databases for computing and applying interventions, as proposed by Indemics, provides a
number of benefits both in terms of development effort and
efficiency. Indemics [7] uses a SQL based language called
Indemics Query Language (IQL) to specify interventions
through the Indemics client. An example of an Indemics
client script written in IQL can be found in Figure 3 in the
paper [7].
Using IQL, epidemiologists can write the intervention scripts
themselves with little effort, or request the programmers
to design the scripts for complex experiments. This avoids
making changes to the simulation code and the subsequent
time and effort in development. Also, for some cases, using databases reduces the running time of the experimental
study execution. Overall, using Indemics to apply interventions considerably reduces development time to implement
interventions and also allows computing very complex interventions based on multiple parameters, which is very difficult within the simulation engine.
In spite of the many advantages of the Indemics platform,
the use of this tool was not as widespread in our research
group as expected. We investigated the reasons for this and
found that lack of a user-friendly graphical interface was one
of the major drawbacks of Indemics, causing a low adoption rate among epidemiologists. Users of Indemics scripts
had to spend considerable amount of time in writing the intervention scripts for setting up a particular experimental
study simulation. Moreover, epidemiologists had to perform
statistical analysis on the results manually. Indemics also
did not provide an efficient way for multiple users to share
their scripts written in IQL. Hence achieving collaboration
between the epidemiologists was difficult, which is a major
requirement from a user-standpoint.
To overcome the above problems, we analyzed the deficiencies in the existing systems and studied the requirements
of a new system. Instead of developing a completely new system from scratch, we decided to re-engineer the existing ISIS
web-based system and its middleware infrastructure to provide access to the Indemics platform and EpiFast diffusion
engine. The new re-engineered system, DISimS, fulfills all
the computing requirements for simulating disease diffusion
process, along with providing a database driven platform for
applying interventions and an easy-to-use, intuitive interface
for epidemiologists. The system also supports multiple simultaneous users and provides sharing capabilities to share
intervention scripts, experiment results and analysis results.
3.
RELATED WORK
The field of modeling and simulations has evolved over the
years. In particular, the area of integrated simulations for
gaming, navigation, incident management, transportation
systems and so on that require the capability to support multiple user interactions based on a number of dynamic parameters, has picked up wide interest in the community. Improving the scalability and performance of simulation systems
still remains an open computing problem in this area. But
in addition, much research is being carried out on how to integrate multiple disparate systems efficiently. Development
of architecture for distributed and parallel systems derives
heavily from the seminal work of Carrier and Gelerenter [10],
who describe the different programming paradigms for each
of the conceptual classes that are categorized based on the
type and extent of communication. More recently, Jain and
McLean [17] discuss the architecture of simulation systems
integrated with massively parallel online games for incident
management training. “Splash” [1] is an ongoing project,
that aims to create an integrated approach of multiple simulation systems such as transportation, media and advertising, food and nutrition, market dynamics and so on to
study the impact of various factors on the health of individuals, in particular for causing obesity. Research has also
been carried out in the area of reflective middleware systems
for integration of multiple simulation environments. Jalali
et al [18] describe the RAISE project that involves integration of diverse existing simulation and data models using
communication and evacuation simulation systems.
In the epidemiology world, the public health community
has been exploring the area of epidemic modeling and simulations and its system architecture. The early work on such
simulation systems primarily focused on accurately modeling the complex epidemic diffusion process and efficiently
implementing it in current powerful computer systems. FluTE
[12] uses a stochastic agent-based model to simulate the disease spread across large populations. The simulation model
has been calibrated using historic pandemic data and it can
be used to study the dynamics of similar influenza outbreak
and evaluate the pandemic preparedness plans. EpiSims
[19] presents regional population by an agent-based system,
constructs social contact network from human daily social
activities and simulates the spread of disease over the social
interactions. BioWar [9] is a more sophisticated agent-based
model to present disease diffusion that encompasses certain
exogenous factors such as media information, geographic information and weather information with social contact network and diffusion information. Instead of examining accurate and realistic dynamics of epidemic diffusion system, the
paper [16] addresses the issue of building a simulation framework and enabling the epidemic simulations in real time to
aid the preparatory measures and planning for future epidemics.
For studying epidemics in real time using simulation systems, it is essential to reduce experimental design and analysis effort, in addition to the execution time. In this paper, we introduce our work on building an integrated framework incorporating experiment preparation module, information management module and offline analysis module for
studying complex epidemic dynamics. This work on design
of experiment preparation refers to the early work on epidemic modeling and the studies on mitigation strategies.
Cauchemez et al [11] introduce and analyze the role of social networks in historic epidemic diffusion. Chao et al [6, 22,
23] study the effectiveness of mitigation strategies to control
disease spread. The correlation between disease spread and
human movement was discussed in paper [20]. The hierarchical spatial disease spread has been evidenced by historic
epidemic outbreaks. The analysis of such historic pandemic
information and simulated epidemic diffusion results have
extended our understanding of infectious disease propagation.
4.
ARCHITECTURE AND IMPLEMENTATION
In this section, we propose a high level architecture of a
173
174
Middleware platform
Description
Usage
Typical Comm. Data Size
Simfrastructure
Jini-based middleware for
high performance distributed systems
between ISIS and EpiFast
Message-oriented data transfer
140 B per simulation
Indemics server
Java based middleware between
Oracle database and EpiFast optimized for
high performance parallel system
Raw data transfer
16 KB per iteration
Table 2: Middleware platforms and their speed comparison. Note: A single simulation experiment is carried
out over multiple iterations to study epidemic spread patterns
it over to the message deliverer. The message deliverer then
directly connects to the database and retrieves large data
from the database. The Simulation session pool module of
the HPC middleware maintains a queue of data retrieved
from the database, to be passed to the HPC-based simulation system. The ‘Middleware for HPC engine’ can support
data transfer rates of several MBps over many continuous
iterations.
As shown in Figure 1, the presence of two middleware
systems connected together through brokers, instead of a
single generic middleware provides efficient execution of the
end-to-end simulation system. The architectural style described in this section can be applied to other complex HPCbased simulation systems. In the next sections, we describe
how this architecture was applied to the development of
the epidemiological system DISimS (Distributed Interactive
Simulation System) by integrating three distinct component
systems.
4.2
based on the results of the simulation run, which can be
subsequently studied for further analysis.
The middleware infrastructure that supports the communication between ISIS and EpiFast simulation engine is
called “Simfrastructure”. Simfrastructure was conceptualized as a high performance distributed system mechanism
that could handle communication between varying distributed
systems at a higher level of abstraction. It was not intended
to support massive data movement but as a signaling mechanism between distributed systems. Hence, Simfrastructure
was developed using Jini, a service oriented architectural
model of communication. Jini-based architecture [21] provides a service oriented abstraction for registering and removing services easily on the fly. Each of the component
systems of Jini can be considered as a composable service.
The concept of “blackboard”, based on JavaSpaces forms
the key component of Jini middleware. Data can be written
on to the blackboard with a specific leasing time specified
by the producer of data and any registered consumer service can access the data from the blackboard. The producer
and consumer systems need not be co-located on the same
server and may be independent standalone systems or clusters. Jini-based service oriented architecture is extended in
the Simfrastructure middleware for communication between
ISIS and EpiFast.
The third important component of the integration process
is the relational DBMS that stores demographic and contact
information about individuals in cities. As explained before,
epidemiologists need to intervene the epidemic diffusion engine such as EpiFast with dynamic run-time parameters.
It is more suitable to store the information about the social
contact network and population demographics in a database
and pass it externally to the simulation engine, rather than
storing it within the engine. The infection data derived from
the diffusion process can be passed back to the database and
stored there for retrieving intervention data for the next time
period. This concept of using a DBMS to apply interventions to a simulation engine externally, was first introduced
in Indemics [7]. From the architectural description in Section 4.1, it was clear that a service oriented middleware was
not the ideal way to connect the database to the EpiFast
simulation engine, since that would affect the performance.
Hence Simfrastructure middleware could not be used for this
purpose. Instead of developing a new middleware for EpiFast engine, we decided to extend the components of Indemics and its supporting infrastructure to connect to EpiFast. Indemics server was used as the middleware platform,
since it provided appropriate level of abstraction to ensure
large scale data movement between EpiFast and database.
Table 2 provides a comparison of the two middleware platforms - Simfrastructure and Indemics Server, their main ob-
Technical diversity of component systems
of DISimS
In this section, we describe the technical features of each
of the component systems that are part of DISimS. The most
important component of an epidemic simulation process is
the simulation engine which simulates the diffusion process.
Since the diffusion process is to be simulated over a large
scale population and the speed of simulation is a key concern, a high performance computation engine is a necessity.
For the purpose of the integration effort to develop DISimS,
we used EpiFast [8], a simulation engine which can simulate disease diffusion in large-scale populations consisting of
millions of individuals. EpiFast is developed as an HPC
based system using C++/MPI and simulates the dynamic
interactions between individual behaviors and the spread of
epidemics. It also has the capability to execute multiple
replicates for better accuracy. EpiFast engine is known for
executing the diffusion process at a very rapid pace. For
instance, to simulate disease diffusion in a city like Miami,
which has a population of about 2 million people, EpiFast
takes less than 30 seconds.
The second component of the integration is called ISIS.
ISIS was initially developed as a web-based front end for
supporting multiple simulation engines such as EpiFast and
Episimdemics using GWT (Google Web Toolkit), which provides a Java based platform for developing web-systems.
ISIS can be accessed by multiple users simultaneously from
any machine with Internet connectivity using any standard
web browser. It has a simple graphical user interface and
provides the ability to specify different parameters for a simulation run. Moreover, it can also generate different plots
175
176
Case study
School based
Block based
System name
EpiFast
Indemics
DISimS
EpiFast
Indemics
DISimS
Experiment
development
time
hard to implement
1 day
5 minutes
hard to implement
0.5 day
5 minutes
Time to set-up
Experiment
analysis
unknown
0.5 day
20-25 mins
unknown
0.5 day
20-25 mins
Total human effort
for experiment
and analysis
unknown
1.5 days
30 min
unknown
1 day
30 min
Experiment
execution time
unknown
3-4 min/iteration
3-4 min+ 3-4 min/iteration
unknown
4-6 min/iteration
3-4 min+ 4-6 min/iteration
Table 3: Comparison of total effort for complex intervention case study on Miami city. Experiment execution
is carried out over multiple iteration days. For DISimS, the expt. execution time includes communication
time between the web-server and Indemics server in addition to actual expt. execution
tional middleware, very high data speeds can be supported
per iteration across multiple simulation runs and thus the
required performance of the DISimS simulation system is
maintained.
4.4
to the front end through the interface broker, where the
corresponding graphs are plotted for the user.
4.5
Usage scenario
Consider a scenario where there is a possibility of a large
scale epidemic outbreak of an infectious disease in a given
city. Two to three persons have already been diagnosed with
the disease. It was further noticed that the diagnosed persons are of school-going age. An epidemiologist is interested
in evaluating the effectiveness of different strategies that can
contain the spread of epidemic within the city. Some of the
strategies include school closure, anti-viral drugs and vaccination. For a school based intervention, if the number of
school children diagnosed with a disease exceeds a certain
threshold then apply intervention to the entire school.
This usage scenario could not be simulated completely
within the simulation engine before the development of Indemics. Even using Indemics, the epidemiologist would have
had to manually write a script using Indemics Query language to implement the particular intervention strategy such
as school closure. There was no easy way for the epidemiologist to locate if any existing users had previously written
or executed the same script, which could be reused. There
was no such searching or experiment management mechanism available. Moreover, the user would have to manually
analyze the results of simulation and come up with conclusions.
With the introduction of DISimS, the epidemiologist can
now use the web-based front-end of ISIS, as shown in Figure 3, from any machine with web access and select the type
of intervention from a list of existing interventions. There
is an easy searchable drop-down menu list for selection of
intervention strategies. Once the results of the simulation
based on various parameters are returned, the ISIS interface
allows for statistically analyzing the results from different
perspectives. Figure 4 shows one such Analysis output for
a dynamic intervention experiment simulated for 200 days
to study the spread of catastrophic flu (type of flu in which
more than 40% of the people in a region are infected with
the flu, without applying any interventions) in a selected
region. The epidemiologist can carry out multiple runs of
the same strategy or multiple strategies on the same region.
The list of interventions enumerates all possible intervention
strategies that have already been written or executed. If a
particular intervention strategy to be applied is not available
in the list, then the epidemiologist can add such a template
herself or request one to be added by the development team.
Data and context flow
The data and context flow within DISimS is shown in
Figure 2. DISimS is accessible from any machine connected
to the Internet through a web based interface. Users, in
particular epidemiologists and public health policy decision
makers select, different parameters of execution to run a
case study experiment such as the region on which simulation has to be carried out, number of days of simulation,
replicates, disease model and so on. In addition, the users
can select intervention scripts to be applied to the diffusion
process, such as Block-based intervention, school based intervention, Distance-1 neighbor vaccination and so on, to
study effects on disease propagation dynamically. This information is passed as input to the interface broker of the
Simfrastructure middleware as a message. The interface broker interprets the data and puts a request for simulation to
be started on the blackboard. The execution broker monitors the blackboard for new simulation requests and signals
the simulation engine- EpiFast to start execution, when data
is available.
The Indemics broker is one of the most important components of DISimS. The Indemics broker maintains several distinct client scripts for different intervention studies. Based
on the intervention parameters submitted on the blackboard
that are passed from the front-end, the Indemics broker selects the corresponding client script template and replaces
the dummy parameters with the actual experiment and intervention parameters of execution. The generated client
script is in IQL that can be interpreted by the Indemics
server. The Indemics server is configured as a background
process that is always running. Based on the invoked intervention (client) script, the Indemics server connects to
appropriate database tables, retrieves intervened population
data and passes it as an intervention to the simulation engine.
Similar to the execution broker that signals EpiFast to
start execution, the Analysis broker starts the analysis script
based on the analysis request made by the user. The request
for analysis of a particular experiment is made to an Analysis
server that runs the R statistical tool. The results of analysis
are written on to the blackboard and consequently passed
177
The effort for writing such a template is one-time and takes
very small amount of time compared to the benefits derived.
For all future case-study runs by any other user, this strategy will be immediately available.
All in all, the epidemiologist can leverage all the benefits
of Indemics, ISIS and Epifast in a user-friendly and efficient
manner with DISimS.
5.
consuming and it was difficult to report simulation results
and conclusions in time.
CASE STUDIES
In this section, we share our experiences of using DISimS
to execute case studies in relation to the earlier simulation
systems like EpiFast and Indemics. After the 2009 H1N1
flu outbreak, we decided to implement a case study to compare the effectiveness of government intervention strategies
and individual protection strategies. In contrast to previous
studies we had run to examine the course of epidemic dynamics, the main objective of this study was to study and
provide effective comparison between government interventions (top-down action) and individual protections (bottomup action), in mitigating the propagation of H1N1 virus.
The government strategies included block based intervention and school based intervention.
The Block intervention strategy specifies that when a fraction of people diagnosed with a disease in a census block exceeds a certain threshold, then the entire block will be quarantined or given medical treatment. The targeted individual protection strategies are more complicated and require
detailed geographic and demographic information. Our simulation engines such as EpiSimdemics and EpiFast did not
have features to integrate such supplemental information at
that time and hence could not implement targeted individual intervention. The code of these high performance simulation engines would have had to be modified to support
such interventions. Our simulation engine designers and the
intervention experiment strategy designers would have had
to work together to precisely interpret the strategies and
translate them into the code for the high performance engine. The entire development process including the requirements gathering, implementation and testing was estimated
to be several weeks, in contrast to the estimated experiment
execution time of only one week. This approach was time-
Figure 4: ISIS-based analysis output of DISimS
To overcome this problem, we applied Indemics, a database
supported epidemic simulation framework and used it to run
the intervention experiment. In contrast to the epidemic
simulation engine like EpiFast, the implementation of interventions in Indemics is modeled by data query algebra, and
the interventions are completely computed using query language of database management systems. Experiment strategy designers only need to describe their scenarios in IQL
and submit the simulation jobs to Indemics for execution.
The experiment development process of Indemics takes a
few days to map the interventions into IQL. Indemics incurs
marginal execution time overhead, but it needs no significant code development or testing. We adopted this solution
to run the intervention studies and it greatly reduced the
study period and saved a significant human effort.
Although the development time for implementing intervention experiments was shortened remarkably by Indemics,
Indemics did not have a module to automatically set up experiments, monitor the state of an experiment and manage
experimental inputs and results. There was no provision
for reusability and sharing by checking if an appropriate Indemics intervention script was previously written by some
other user. Also, when the interventions had to be simulated
with different settings and parameters and repeated tens of
times to reduce random factors, using Indemics became cumbersome. For example, the scripts to run factorial experiments by changing multiple parameter values had to be prepared manually, which was very error-prone. The simulation
inputs and outputs had to be well organized to avoid overwriting or misreading. The simulation jobs also had to be
monitored by the experiment executors. Such tasks needed
considerable manual effort. Reading and understanding raw
simulation results was difficult since Indemics did not have
statistical analysis or plotting modules.
From the experience of implementing the case studies for
H1N1, we realized that the usability of the simulation systems had to be leveraged further. Hence we developed “DISimS”, with features like user interface, experiment data management, job monitoring and analysis in addition to the at-
Figure 3: ISIS-based front-end system of DISimS
with dynamic intervention selection
178
tributes that were already provided by Indemics and EpiFast. Employing DISimS for the experiments of schoolbased and block-based interventions reduces the overall experiment set-up and management time and enhances the
productivity of the users considerably. The users only need
to select the intervention scripts and parameters of execution using a simple graphical web-based interface. The
data files for the factorial experiment design are well organized and well-archived and the simulation jobs are automatically monitored and scheduled. DISimS introduces a
marginal overhead of execution as compared to Indemics,
which is equivalent to the communication time between the
web based front end to the Indemics server middleware. Table 3 shows the comparison of the total effort for the intervention experiments using EpiFast, Indemics and DISimS
on the city of Miami. As can be seen in the table, the total
human effort for experiment design and analysis is reduced
significantly by DISimS compared to previous systems, and
the total increase in the experiment execution time is only a
few additional seconds, which is negligible. This table shows
the value of DISimS for improving the productivity of epidemiologists and public health policy decision makers. They
can now set-up, manage and execute complex intervention
case studies without much help from the high performance
computing developers.
Figure 5 shows an epicurve derived as an output from
the DISimS system. It shows the comparison of the two
strategies- School based intervention and Block based intervention on the propagation of an epidemic, over a 300 day
period. The epi-curves shows the quantity of infected cases
and diagnosed cases on each simulation day, from the 1st
day to the 300th day. From the epi-curves, the effectiveness of each mitigation strategy can be easily observed. In
this plot, School intervention strategy performs better than
Block intervention strategy. The Block intervention is more
effective to contain an epidemic than the base case without
any interventions.
6.
6000
Size of cases
5000
4000
3000
2000
1000
0
0
50
100
150
Days
200
250
Diagnosed cases without interventions
Diagnosed cases with School intervention
Diagnosed cases with Block intervention
Infected cases without interventions
Infected cases with School intervention
Infected cases with Block intervention
Figure 5: Epicurves for School and Block intervention case studies on Miami
lution user-friendly graphical interface to keep the attention
of the users. The architecture described in this paper can inform the design and development of gaming and such other
systems.
The design and implementation of DISimS were based on
the features of existing component systems and technologies.
For instance, the main reason that the Jini-based simfrastructure and Indemics server could be extended to EpiFast
was due to the master-slave nature of EpiFast algorithm.
The master node that executes EpiFast algorithm on a cluster can aggregate the intermediate results from slave nodes
and pass information to the Indemics server and Simfrastructure as needed. It can also decompose the information
to be passed from database to specific slave nodes. Hence,
in the absence of the master-slave nature of EpiFast algorithm, the co-ordination between component systems could
have been an issue. In case of symmetric parallel computation without a master, a new mechanism will have to be devised to ensure synchronization. Nonetheless, the approach
of technological assessment and remodeling described in this
paper, to analyze the feasibility of existing technologies for
building the architecture of a complex system, helped us to
make an informed decision about the development of DISimS.
DISCUSSION
In this paper, we discussed the architecture of a complex
simulation system made up of components such as an HPCbased simulation system, database management system and
a front-end user interface. More and more applications today are using high performance engines for computations.
Databases and interfaces are part of most of the application systems. Hence the architecture described here can be
easily extended to multiple domains that involve integration
of distinct software systems. We demonstrated the applicability of this architectural style to implement an end-to-end
epidemic simulation system called “DISimS”.
Simulation systems for transportation, biological systems,
disaster planning systems, financial modeling systems and
so on, need features similar to that of an epidemic simulation system, with core diffusion functionality combined with
interactive and analysis features. Hence the architectural
considerations described in this paper can be used as the
basis of designing such large scale systems. One of the most
interesting applicability of the architecture described in this
paper is to develop large scale gaming systems. Gaming
systems not only need fast computations at run-time along
with changing parameters, but also very detailed graphical
interfaces for the convenience of users. Performance is a key
requirement for such gaming systems along with a high reso-
7.
CONCLUSION
In this paper, we have given an architectural overview on
how to design complex simulation systems, that can enhance
the usability, flexibility and capability of software systems.
We have shown a specific use case of the architecture by
applying it to DISimS, where three existing component systems ISIS, EpiFast and DBMS were integrated in an efficient
manner. The DISimS system with its web interface and analytical features has proved to enhance the productivity of
users considerably. It is currently being used internally at
Network Dynamics and Simulation Science Laboratory, and
we plan to make it available to a large population of epidemiologists and public health policy decision makers. We
179
300
plan to add more interesting and useful features to DISimS
in the near future.
8.
ACKNOWLEDGEMENTS
9.
REFERENCES
[12]
We thank our external collaborators and members of the
Network Dynamics and Simulation Science Laboratory (NDSSL)
for their suggestions and comments. This work has been
partially supported by NSF Nets Grant CNS- 0626964, NSF
[13]
HSD Grant SES-0729441, NIH MIDAS project 2U01GM0706947, NSF PetaApps Grant OCI-0904844, DTRA RD Grant
HDTRA1-0901-0017, DTRA CNIMS Grant HDTRA1-07-C0113, NSF NETS CNS-0831633, DHS 4112-31805, DOE DE[14]
SC0003957, NSF REU Supplement CNS-0845700, US Naval
Surface Warfare Center N00178-09-D-3017 DEL ORDER
13, NSF Netse CNS-1011769 and NSF SDCI OCI-1032677.
[15]
[16]
[1] SPLASH: IBM Project. http:
//www.almaden.ibm.com/asr/projects/splash/.
[2] K. Atkins, C. L. Barrett, R. J. Beckman, et al. DTRA
National Guard study capability demonstration.
Technical Report 06-060, NDSSL, 2006.
[3] K. Atkins, C. L. Barrett, R. J. Beckman, et al.
Simulated pandemic influenza outbreaks in Chicago:
NIH DHHS study final report. Technical Report
06-023, NDSSL, 2006.
[4] K. Atkins, C. L. Barrett, R. J. Beckman, et al. An
analysis of public health interventions at military
bases during a pandemic influenza event. Technical
Report 07-019, NDSSL, 2007.
[5] C. L. Barrett, K. R. Bisset, S. G. Eubank, X. Feng,
and M. V. Marathe. Episimdemics: an efficient
algorithm for simulating the spread of infectious
disease over large realistic social networks. In Proc.
ACM/IEEE conference on Supercomputing, pages
290–294, 2008.
[6] N. E. Basta, D. L. Chao, M. E. Halloran, L. Matrajt,
and I. M. L. Jr. Strategies for pandemic and seasonal
influenza vaccination of schoolchildren in the united
states. American journal of epidemilogy, 170, 2011.
[7] K. Bisset, J. Chen, X. Feng, Y. Ma, and M. Marathe.
Indemics: an interactive data intensive framework for
high performance epidemic simulation. In Proceedings
of the 24th International Conference on
Supercomputing (ICS), pages 233–242, 2010.
[8] K. R. Bisset, J. Chen, X. Feng, V. S. A. Kumar, and
M. V. Marathe. EpiFast: a fast algorithm for large
scale realistic epidemic simulations on distributed
memory systems. In Proc. the 23rd International
Conference on Supercomputing, pages 430–439, 2009.
[9] K. M. Carley, D. B. Fridsma, E. Casman, A. Yahja,
N. Altman, L.-C. Chen, B. Kaminsky, and D. Nave.
Biowar: Scalable agent-based model of bioattacks.
IEEE Transactions on Systems, Man, and
Cybernetics, Part A, 36(2):252–265, 2006.
[10] N. Carriero and D. Gelernter. How to write parallel
programs: A guide to the perplexed. ACM Comput.
Surv., 21(3):323–357, 1989.
[11] S. Cauchemez, A. Bhattarai, T. L. Marchbanks, R. P.
Fagan, S. Ostroff, N. M. Ferguson, D. Swerdlow, and
the Pennsylvania H1N1 working group. Role of social
[17]
[18]
[19]
[20]
[21]
[22]
[23]
180
networks in shaping disease transmission during a
community outbreak of 2009 h1n1 pandemic influenza.
Proceedings of the National Academy of Sciences,
2011.
D. L. Chao, M. E. Halloran, V. Obenchain, and I. M.
Longini Jr. FluTE, a publicly available stochastic
influenza epidemic simulation model. PLoS
Computational Biology, 6(1), 2010.
F. Curbera, M. Duftler, R. Khalaf, W. Nagy,
N. Mukhi, and S. Weerawarana. Unraveling the web
services web: an introduction to soap, wsdl, and uddi.
Internet Computing, IEEE, 6(2):86 –93, mar/apr 2002.
T. Erl. Service-Oriented Architecture: Concepts,
Technology, and Design. Prentice Hall PTR, Upper
Saddle River, NJ, USA, 2005.
R. T. Fielding. Architectural styles and the design of
network-based software architectures. PhD thesis,
University of California, Irvine, 2000. AAI9980887.
H. V. Fineberg and M. E. Wilson. Epidemic science in
real time. Science, 324:987, 2009.
S. Jain and C. R. McLean. Integrated simulation and
gaming architecture for incident management training.
In Winter Simulation Conference, pages 904–913,
2005.
L. Jalali, N. Venkatasubramanian, and S. Mehrotra. A
reflective middleware architecture for simulation
integration. In Proceedings of the 8th International
Workshop on Adaptive and Reflective MIddleware,
ARM ’09, pages 3:1–3:6, New York, NY, USA, 2009.
ACM.
S. M. Mniszewski, S. Y. Del Valle, P. D. Stroud, J. M.
Riese, and S. J. Sydoriak. Episims simulation of a
multi-component strategy for pandemic influenza. In
Proceedings of the 2008 Spring simulation
multiconference, SpringSim ’08, pages 556–563, San
Diego, CA, USA, 2008. Society for Computer
Simulation International.
C. Viboud, O. N. Bjornstad, D. L. Smith,
L. Simonsen, M. A. Miller, and B. T. Grenfell.
Synchrony, waves, and spatial hierarchies in the
spread of influenza. Science, 312(5772):447–451, 2006.
J. Waldo, T. J. Arch, and T. Miura. The jini
architecture for network-centric computing, 1999.
J. T. Wu, S. Riley, C. Fraser, and G. M. Leung.
Reducing the impact of the next influenza pandemic
using household-based public health interventions.
PLoS medicine, 3(9):361, 2006.
Y. Yang, J. D. Sugimoto, M. E. Halloran, N. E. Basta,
D. L. Chao, L. Matrajt, G. Potter, E. Kenah, and
I. M. Longini. The transmissibility and control of
pandemic influenza a (h1n1) virus. Science,
326(5953):729–733, 2009.
Download