AAAI Technical Report SS-12-03 Game Theory for Security, Sustainability and Health The Design of Computer Simulation Experiments of Complex Adaptive Social Systems for Risk Based Analysis of Intervention Strategies Deborah V. Duong Agent Based Learning Systems to a treatment strategy. Data farming changes many parameters to see their effect, but does not sample those parameters in proportion to their occurrence in the real world. Furthermore, the parameters input to data farming experiments as though they were independent are often dependent upon each other in the real world, and although efforts are made to come up with plausible parameter combinations, there is a need for a general and systematic way to ensure that parameters are correlated in the same way that they are in the real world. In particular, Courses of Action (COA) in simulations of the complex adaptive system of irregular warfare are often entered in scripted form, so that our international interventions are not dependent on enemy actions nor on the environment. COAs are often scripts because the simulation does not have a reactive enemy with the variety of reactions that real world enemies have, or the simulation is meant to double as a human-in-the-loop wargame adjudicator. However in reality and in accordance with Complex Adaptive System (CAS) theory, the international intervention should be seen as part of the environment itself rather than an exogenous constant, even for strategy testing. Seen in this light, the course of action to be tested is not a static set of actions but a doctrinal set of strategies of how to change actions in response to an environment. Abstract Computational social science, as with all complex adaptive systems sciences, involves a great amount of uncertainty on several fronts, including intrinsic arbitrariness such as that due to path dependence, disagreement on social theory and how to capture it in software, input data of different credibility that does not exactly match the requirements of software because it was gathered for another purpose, and inexactly matching translations between models that were designed for different purposes than the study at hand. This paper presents a method of formally tracking that uncertainty, keeping the data input parameters proportionate with logical and probabilistic constraints, and capturing proportionate dynamics of the output ordered by the decision points of policy change, for the purpose of risk-based analysis. Once ordered this way, the data can be compared to other data similarly expressed, whether that data is from simulation excursions or from the real world, for objective comparison and distance scoring at the level of dynamic patterns as opposed to single outcome validation. This method enables wargame adjudicators to be run out with data gleaned from the wargame, enables data to be repurposed for both training and testing set, and facilitates objective validation scoring through soft matching. Artificial intelligence tools used in the method include probabilistic ontologies with crisp and Bayesian inference, game trees that are multiplayer non-zero sum and decision point based rather than turn-based, and Markov processes to represent the dynamic data and align the models for objective comparison. Tracking Uncertainty in Complex Systems Current methods in the design of computer social simulation experiments do not track uncertainty in a way that adequately informs risk-based analysis. One popular method of computer social simulation experimentation,, data farming, describes an outcome space by way of parameter sensitivity testing, which is not the same as and does not achieve the goal of describing the likelihood of outcomes In the department of defense, courses of action are often tested with a wargame, and this is true for irregular warfare courses of action that test the social environment as well. Nearly all of the major Department of Defense studies of irregular warfare have involved a wargame adjudicated by computer simulation models. However, a wargame is resource intensive and cannot be played the number of times required to give the commander an estimate of risk. To get at the computation of risk, the “wargame” should be “run out” multiple times, by running its adjudicators multiple Copyright © 2011, Association for the Advancement of Artificial Intelligence ( All rights reserved. 13 times with automated players that act as though human-inthe-loop players had put in the moves, in a way that captures their reactions during the actual wargame. Multiple of runs of adjudicators using data farming methodology is inadequate not only because scripts are not sensitive to the enemy or the social environment, but also because the entire paradigm of multiple independent exogenous parameters is against the principles of co-evolution in complex adaptive systems - that everything in a coevolving system is dependent on everything else. Rather, the simulation should be a closed system in which the only thing that changes between runs is the random number seed. Anything else is the testing of another, different conceptual model. In other words, the simulation, as much as possible, should be a primordial soup of independent, accepted assumptions from which emerge the phenomena we are trying to model. Designing simulations in this manner not only ensures the scientific soundness of the outcome (Duong 2010), but enables us to compare a single set of the assumptions, a single conceptual model, to real world data. turn requires making use of the vast amounts of data that are more uncertain than the data of physics based systems. Finally, current simulation methods do not deal with the problem of composition of models in a general way. Multiple models are needed for irregular warfare simulation, because the socio-cultural environment of irregular warfare is complex. Therefore data of different perspectives and resolutions must be traded between models that were made for different purposes. The translation between such models is another source of uncertainty. Even with the simplified models of conventional warfare, data translation between models is one of the biggest issues. Semantic interoperability, or how to ensure that the variables traded between models mean the same thing, is the central problem of model composition and impedes joint analysis. The problem is so significant that organizations and conferences are devoted to interoperability standards, that is, how to compose multiple models. However, these conferences seem to be more devoted to developing and enforcing a standard that all parties can follow as opposed to focusing on translation of non-standard data which is disparate for pragmatic reasons. As the emphasis in warfare analysis shifts to the social and political world, all the data of the internet, which will always be disparate, is useful for informing warfare analysis. Further, the complexity of model composition increases greatly with the social world. Using multiple models and composing them properly is important to model all the complex aspects of the social environment, as well as for testing policy for robustness against many plausible yet not agreed upon social theories by switching models in and out. The models and the data are a form of uncertainty, and tracking this uncertainty reduces to the problem of how to translate repurposed data. The crux of the problem of composition is that models and data to be repurposed for a particular study are in different perspectives and different resolutions (Davis and Anderson 2004), and to achieve an understanding of the likelihood of outcomes of courses of action, uncertainty must be tracked as data moves through models across different perspectives and resolutions. The more the simulation is a primordial soup of assumptions held still in code and parameters, including rules that change parameters of sub-models in proportion to how they change in the real world, the more we keep track of the uncertainty for as-accurate-as-possible risk based analysis. In order to describe outcomes proportionately, so that the outcome space preserves the uncertainty in correct proportion, parameter changes to sub-models should arise endogenously rather than as the result of data farming’s covering the space of all possible combinations of parameters orthogonally under an assumption of independence. Additionally, current simulation methods do not put to use available uncertain data for testing sets and training sets, and are not robust with respect to uncertain matches for variable trading between models. Typical simulation studies require exact data and list the lack of it as a limitation, with no formal method for repurposing, translating, and recording/taking into account the distance between inexactly matching data. One example is a military simulation of conventional warfare, which assumes similar maneuver abilities of symmetrical sides wash out and all that remains is force upon force, reducing warfare to physics. Politics and a center of gravity of popular support are not considered: rather the models are simplified and make use of small sets of highly reliable data. However, asymmetric warfare by definition requires that we drop the assumption that the sociocultural idiosyncrasies wash out, which in Starting out with proportionate model input and tracking uncertainty from disparate data sources and across disparate models is essential to an output space that quantifies uncertainty for risk based analysis. We present a methodology for keeping track of uncertainty that arises from the random seeds of the models, from the use of uncertain data, and from the translation between models, so that the uncertainty may be expressed in a single picture of the risk, that can be compared apples to apples with real world data. 14 a method of defining and detecting complex and dynamic simulation states, both logically and probabilistically, to include in rules of proportionate parameterization and translation of data. These states are strategically relevant, and may be analyzed for tipping points. Measures of Effectiveness and indicators are implemented with both crisp and Bayesian Logic, and expressed dynamically in a Markov Process for tipping point analysis (Bramson 2009). State transition samplings occur at decision points, which are more cognitively relevant to tipping points than evenlyspaced time intervals. Risk-Based Approach to Analysis Our methodology for formally tracking uncertainty in social simulation studies includes: a formal definition of the conceptual model under study, the ‘study model’, that defines and enforces the entities and relationships under scrutiny. The formal study model is implemented with a probabilistic ontology, and enforced with both crisp and Bayesian inference (Duong 2010). a method of translating data both logically and probabilistically between the study model and the simulation models that implement it, and between the study model and the static data for testing and training. The translation is within the parameter sets that are legal and accredited for the study, and tracks the uncertainty of probabilistic data match. Translation is made across multiple resolutions and perspectives. The translation is implemented through a hub and spoke design of probabilistic ontologies, with the study model at the hub, and simulation and data ontologies at the spokes. Between the hub and spoke are mediation ontologies that use Bayesian and logical inference to traverse multiple resolutions and perspectives. Probabilistic logic, with the use of ‘L-nodes’, keep the Bayesian and logical inference consistent. The hub and spoke design also facilitates efficient “switching in and out” of simulation models for robustness testing of COAs (Duong et al. 2010). a method of measuring probabilistic distance between dynamic data, from simulation output and from time series real world data, for data comparison and validation. Cross entropy measures of probabilistic distance are used to compare two Markov processes, to obtain an objective ‘validation score’ (Duong et al. 2010). Experiment In this study, the hub and spoke probabilistic ontologies, game trees, Markov Processes and probabilistic comparison functions are used to analyze the 2010 US Army Training and Doctrine Analysis Center‘s Tactical Wargame (TWG) run in White Sands, New Mexico, a major Department of Defense study of irregular warfare (Schott 2009). The 2010 version of the game used a scenario in Helmand province, Afghanistan. This analysis is meant to capture the reasons behind the decisions of the game, and then play the game out with the simulation models that adjudicated the game multiple times, in lieu of multiple expensive human-in-the-loop runs (Duong 2007). a method of incorporating multiple, possibly conflicting data sources weighted by credibility in the translation, and of combining these sources. For this study only a single source of data was used however this capability is a future direction of this research program. (Duong 2011). The ‘Hub’: The Conceptual Model of the TWG 2010 Study a systematic method of proportionate parameterization. For social simulations that have parameters that represent decisions, indicators that define states and goals are used in a multi-player, nonzero-sum game tree that branches upon decision points and strategies expressed in ontological rules. The game tree ensures that moves are entered into model in a proportionate manner, in appropriate reaction to other stakeholders and environment (Vakas et al. 2001). The conceptual model of a simulation study is a formal specification of what is under study, including categories of entities and the rules by which they relate to each other. It should include rules about the relations between input parameters and about significant relations between output parameters of interest, called indicators. These indicators may be either crisp or probabilistic. The formal conceptual Copyright © 2011, Association for the Advancement of Artificial Intelligence ( All rights reserved. 15 The strategies are derived from a combination of TWG 2010 player interviews and a mutual information analysis of the moves of the game. Pointwise mutual information (PMI) scores were obtained for different popular support indicators that the players had access to. PMI, a concept from information theory, tells how uniquely a sign, such as an the wargame’s popular support metric (OAB score), is associated with another sign, such as an action. We used PMI to tell how players actually reacted to OAB scores. PMI quantifies the discrepancy between the probability of co-occurance of two outcomes given their joint distribution and the probability of their co-occurance given only their individual distributions, applying the equation: model supports the goal of an analysis of outcome likelihood by defining relevant crisp and probabilistic indicator states to track the likelihood of, and also by setting rules for proportionate relationships between input parameters, so that uncertainty is proportionately preserved. The conceptual model of the TWG is formally defined in the hub ontology. This ontology includes all the legal moves of the human-in-the-loop wargame, which are grouped by which side may make them, and by what other moves the sides consider of the same type. The ontology also describes a player, grouped by measures of effectiveness for how well it is doing in the game. As a game of irregular warfare, the player is classified by indicators of the attitudes of the population towards him, and whether these indicators mean that the player has reached a decision point or goal state based on his and his teammates and opponents popularity indicators. Player strategies are defined in accord with the decision points, goals, branches and sequels of the military decision making process (MDMP). These strategies define the appropriate set of move mixes for a player to test when a decision point has been reached. The hub ontology inherits from a design of experiment ontology which includes concepts of the MDMP, separated out so that they may be used in other studies. The hub ontology categorizes particular ontological ‘individuals’ with particular states, including both moves and players. These individuals are kept separately in an ‘assertion box’ ontology that holds only individuals. The wargame players’ PMI scores show that there was a high correlation between states of players’ own popular support indicators and changes in their personal move mixes. In particular, it was found that the weaker the popular support levels for a coalition force or Afghan government player, the less inclined the player was to use kinetic action. On the other hand, a Taliban player was more inclined to use violence if he was gaining in popularity. However, looking at one’s own popular support is not in accordance with coalition force doctrine: ideally, coalition forces would only pay attention to the Afghan government’s popular support. The strategies in the hub ontology are crisp rules that enforce the relations between simulation entities which result in enforcement of parameter dependencies. However, we could have used Bayesian enforcement of rules as well. Although soft indicators were not implemented in the TWG 2010 study, they could have been. We could have implemented soft indicators by using the same method that was used to validate the study: a probabilistic distance function. Soft indicators are more interesting than hard in that they measure patterns, which are particularly important for keeping track of emergent phenomena. Emergent phenomena in micro-level agent-based simulation results in characteristic patterns between attributes, patterns that could be detected with soft indicators which measure the distance from a gold-standard distribution. As we shall see in the next section, the existence of emergent phenomena could then be entered in to macro-level simulations that need not know the details of how the emergence took place. With measures of effectiveness and strategies in the ontology, we tested both the actual TWG 2010 strategy and the more doctrinal strategy. The measures of effectiveness in the ontology allow us to test both doctrine and player perception of indicators, asking whether the wargame outcomes would be more positive if players had easier access to popular support levels of other players. The ‘Spokes’: Ontologies The Simulation and Mediation While the hub ontology contains the conceptual model of the study, including rules and measures of effectiveness, the spoke ontologies contain the conceptual models of the simulations that implement the conceptual model of the study. The rules of the hub control the proportionality of the input to the simulations and detect the output indicators 16 rules, augmented with Bayesian to keep track of uncertain matches. However, often all that is needed is a general idea from another simulation, in which case a loose coupling is prescribed. For example, a simulation may need to know under a set of conditions, what the general relation is between employment and gross domenstic product (GDP). A lower level model can generate the networks according to the conditions, compute the relations, and send back just the relations and not the general networks. It can be run multiple times to ensure the relation is correct. In this study, a lower level higher resolution model, Nexus, was run many times to send a distribution of key leader engagement outcomes back to the hub. from the simulations that matter to the study, while the rules of the mediation ontologies serve to translate between models using both crisp and probabilistic rules. Moves in the hub ontology are translated into the moves of models. Due to resource limitations, we used different versions of the simulation models than were used in the TWG 2010 wargame adjudication. However, the ontology imparts the ability to translate both logically and probabilistically across models that have an imperfect match, so that the strategies of the actual wargame represented in the hub ontology may be easily played out in different models than were played in the actual war game. This ability to switch simulation models in and out is a way of testing strategies for robustness against different possible social environments. Although switching in and out multiple models was not done for this study, the hub and spoke archetecture allows disagreement amongst social scientists to be treated as just another form of uncertainty to factor into the final expression of outcome likelihood in the output Markov process. The hub and spoke is efficient because new data need be only translated to the hub through a single mediation ontology, after which it can communicate with all other models in the study. Bayesian inference implements loose coupling between multiresolutional models by holding the patterns from the models in conditional probability tables in the hub: patterns that define the relations between the attributes of individuals, but do not detail the particular networks of the individuals. When it is time for a model to be run, a macro-to-micro translation is done: the Bayesian networks of the hub generate individuals for the parameters of the individual lower level models, incorporating the right proportions for all forms of uncertainty. The lower level models can create their networks several times, and then compute the relations back to the conditional probability tables of the hub in a micro-to-macro translation. Higher level models likely need input only from the relations. These relations may include soft indicator states, for example, the relation between GDP and employment that has emerged from an individual networked model may indicate a state of recession. The higher level model may only need to know that a recession occurred without the details that the lower level model knows, and may even fill in other inferred attributes based on a tripped recession indicator. Both crisp rules and Bayesian rules of the mediation ontologies perform a translation. Both kinds of inference are capable of translating across resolutions. For example, an ontology with the right rules for categorization can automatically categorize entities in a high resolution model to a low resolution model. For example, it can categorize an AK147 of a high resolution model to a gun in a low resolution model. Bayesian rules can handle more complex situations, for example, translating from the low resolution model to the high resolution. They can also preserve the uncertainty, in the form of multiple parameter sets in the right proportion, when the categorization is inexact, because they are based on different conceptual models. These proportionate parameter sets allow models to detect and respond in the right proportion, for the final output Markov process. The mediation ontologies inherit from the multiresolutionBayesian ontology, that contains a macro individual class which holds the conditional probability distributions, and a micro individual class which has attributes generated by the distributions. The actual individuals are stored in the ‘abox’ ontology. The project separates the classes, or ‘tbox’ from the individuals, or ‘abox’. Normally, the tbox contains the definitions by which individuals of the abox are classified, but in the case of the probabilistic ontology, macro individuals belonging to the abox actually contain statistically definitive The two types of rules facilitate the two main kinds of coupling between models, tight coupling and loose coupling. If for some reason one model needs to know all the details in another model, for example networked models that must be kept consistent, then models can be tightly coupled. All the individual details of one model can be examined and translated to another through precise 17 side’s strategy. Not only the players’ models, but the players’ models of the other players’ models are taken into account. The tree branches not at turns, but as decision points are reached. Then, not all possible moves, but the variety of moves that are in accordance with doctrine are tried, to apply general doctrine and general commander’s guidance to the specific situation. Only two moves ahead were examined for the composition of adjudicator models. Checkpoint-restart is implemented in the model to save the model state at the decision points for efficient game tree runs. information of a more mutable form than the crisp definitional traditional ontology. Automated Model Runs with Automated Players Next is the implementation of the model runs, given the ability to translate probabilistically and semantically across models and the ability to recognize indicators such as decision points and goal states to be used by automated players to emulate human-in-the-loop players. Figure 1 shows the hub and spoke architecture of the probabilistic ontologies. The COA analysis service of the open source Extensible Behavioral Modeling Framework (XBM) was used to control execution between the models using the author’s Strategic Data Farming technique (Duong et al 2010). Execution would flow back and forth from the two models to the ontology of the hub, which would keep track of the state of the simulation and the players’ measures of effectiveness, such that when the simulation reaches a decision point of a player, a new set of moves may be prescribed in accordance with the player’s strategy. Recall, player strategy is computed from pointwise mutual information scores for the co-occurrence of indicators and action strategies during the game, in conjunction with player interviews. Automating the game serves to keep parameters proportionate as opposed to the common technique of treating scripted moves as independent parameters to datafarm. Rather, the moves of players are very dependent on the reactions of the populace and of the enemy. Strategic Data Farming captures and tests doctrine as well as appropriate measures of effectiveness (MOE) for wargames. In the TWG study, two different kinds of MOEs were tried. First, the popular support metric (OAB) five value atmospherics of the actual wargame were tried, and it was found that the level of aggregation was so high that the automated player had trouble differentiating between states. Next, a more scalar issue stance value was used as the MOE, which provided enough gradient for automated players to optimize their moves. This is instructive for the actual Tactical Wargame, because if automated players have difficulty optimizing their moves because the signs offer little change, then the same is true for human in the loop players. Figure 2 shows the XBM game tree with an OAB measure of effectiveness that has little differentiation, and Figure 3 shows the XBM game tree with a scalar, “issue stance” measure of effectiveness. The actual wargame of many players was reduced to three sides, and the moves were reduced to only a dozen, which were the ninetieth percentile of the most frequent moves. Figure1. Hub and spoke architecture of probabilistic ontologies Figure 2. Coarse Measures of Effectiveness, OABs, difficult to optimize The runouts involved a game tree with multiple players and individual goal definitions. Players would look ahead, branching according to strategy when a decision point was reached (when an indicator fires). Then, the run of the model would represent a mental run-out for the player, in accordance with its strategy and its knowledge of the other 18 series data that can be translated into the same state variables can be expressed in a similar Markov processes, including different excursions of the same study, and a probabilistic distance taken between the distributions of the datasets. We ran the issue stance game trees out ten times, and classified the states by levels of violence as well as popularity of the Afghan government (by majority vote) , and then compared it to similar time series data from a quarterly poll in the Helmand province, the same area represented in the 2010 Tactical Wargame. Applying the KL divergence measure of probabilistic distance, the TWG result scored a .21 on a scale of 0 to 1, where 0 is exactly the same Markov Process, and 1 is the most different Markov process possible for the same state variables (see Figure 4). The KL divergence, from information theory, measures the extra “nats” of information (because we used natural logarithms) needed to change a result from one probability distribution into that of the other. It is a good measure because it tells us the same kind of thing that science strives to formalize: the chances that our hypothesized model is wrong, and that the test results actually came from a different model. It tells the likelihood that the “real world” runs could have come from the model, by telling us how far they are from the model. Probabilistic distance of Markov Processes is a good way to validate simulations because it measures the entire dynamics of a simulation rather than single outcome validation. The world is too arbitrary for single outcome validation to work, rather, we can only expect our data to conform to the real world at the levels of pattern and correlation. Figure 3. Scalar measures of effectiveness, issue stances, easier to optimize Strategic Data Farming of a wargame is useful for analytical purposes for validated models, but before the models are validated it is useful in exposing bugs. The first thing that the game trees do during strategic data farming runs is expose ways that the players could “game the game” by trying unrealistic combinations of moves that happen to work in the game but not in real life. Our strategic data farming runs of TWG to test measures of effectiveness were useful in exposing model shortcomings. For example, we compared doctrinal player moves, in which the coalition force players strategized using host nation atmospherics, to see if making this information more prominent to players would make a difference in the outcome, as opposed to players that only had access to their own atmospherics. It was found that there was no difference in the outcome of the game. It may be that the adjudicator models did not differentiate between the cases in the way that doctrine indicates, and so the models may need adjustment. Result Validation with Markov Process Proportionate runs of the models, such that the input parameters for moves make sense together (as in Strategic Data Farming), enable us to keep a proportionate output space that can be put in a Markov process. Had there been input or model data of different credibilities, we could have varied it across the range of its certainty to further preserve proportion. Other rules of proportion of input parameters could ensure they are correlated properly for model run out as well. Figure 4. Comparison of probabilistic distances of Markov processes The purpose of putting the results in a Markov process is twofold. First, it expresses the entire dynamic results of the simulation, and since data was taken at decision points, it expresses cognitive states that can be analyzed for their importance for sending the simulation into or away from states to be desired or avoided. Secondly, any set of time Conclusion We have demonstrated a method by which uncertainty of data match can be tracked across simulation runs, and by 19 which measures of effectiveness may be monitored in these simulations and compared to data. We have demonstrated a method by which parameters input to the simulation may be held in proportion to their real world values, and by which the information gleaned from a wargame may be run out in adjudicators in multiple runs for risk based analysis. In this method, logical and Bayesian inference are used to match data across multiple perspectives and resolutions, so that multiple models and data may be repurposed and combined. The uncertainty of data matching , sub-model accuracy, and intrinsic randomness of stochastic models are taken into account in the final compilation of a probabilistic description of the outcome space for risk-based analysis of the complex adaptive system of irregular warfare. REFERENCES Bramson, A. 2009. “Measures of Tipping Points, Robustness and Path Dependence.” AAAI Fall Symposium. Davis, P., Anderson, R. 2004. “Improving the Composability of DoD Models and Simulations”, RAND Corporation, Santa Monica, CA. Duong, D. 2011.. “Voting Processes in Complex Adaptive Systems to Combine Perspectives of Disparate Social Simulations into a Coherent Picture.” AAAI Spring Symposium. Duong, D. 2010. “Verification, Validation, and Accreditation (VV&A) of Social Simulations” Spring Simulation Interoperability Workshop, Orlando. Duong, D. 2007. “Adaptive Simulation: A Composable Agent Toolkit for War Game Adjudication” Proceedings of the Agent 2007 Conference on Complex Interaction and Social Emergence, North et al eds., Evanston IL. Duong, D., Makovoz, D., and Singer, H. 2010. “The Representation of Uncertainty for the Validation and Analysis of Social Simulations” Fall SISO Conference, Orlando, September. Schott, R. 2009. “Irregular Warfare: Building a Counterinsurgency Based Tactical-level Analytic Capability”, MORS Irregular Warfare Analysis Workshop, February. Vakas, D. (Duong), Prince, J.,Blacksten, R. and Burdick,C. 2001. “Commander Behavior and Course of Action Selection in JWARS”. Tenth Conference on Computer Generated Forces and Behavioral Representation. 20