VERIFICATION, VALIDATION, AND ACCREDITATION (VV&A) OF COMPLEX SOCIETAL MODELS & SIMULATIONS (M&S) Robert Clemence, Jimmie McEver, David Signori, EBR Dean Hartley, Hartley Consultants Al Sciarretta, CNS Technologies Stuart Starr, Barcroft Research Institute ABSTRACT The goal of this research program was to explore innovative approaches to performing verification, validation, and accreditation (VV&A) of composable political, military, economic, social, information, and infrastructure (PMESII) models. Consistent with that goal, the program focused on two objectives. First, it developed a “baseline” VV&A process that is well-suited to the development and evolution of DARPA’s Conflict Modeling, Planning and Outcomes Experimentation (COMPOEX) program. Second, it developed highly accelerated VV&A processes for two conditions, compressed and hyper-compressed VV&A. To establish the context for these objectives, this paper briefly characterizes the nature of the problem. That is followed by a brief description of the “baseline” VV&A approach that the study team has developed. The bulk of this paper focuses on innovative techniques for performing compressed and hyper-compressed VV&A. It describes the activities that must be performed during “baseline” VV&A to prepare the VV&A Team for compressed and hyper-compressed approaches. That leads to the formulation of several V&V techniques that are well-suited to these circumstances: experimental design, logic traceback, and model comparison. These methods are described and residual challenges are identified. The paper concludes by identifying residual issues that should be addressed in future research efforts. A. NATURE OF THE PROBLEM The VV&A of tools that model PMESII behaviors is very challenging under the best of circumstances. The study team began the research activity by performing an in-depth review of the VV&A literature. That review confirmed that the history of VV&A of PMESII models is limited and the level of V&V has generally been inadequate. Furthermore, that review revealed that a PMESII model requires integrating multiple domains that often are based on competing theories of varying maturity. The VV&A of DARPA’s COMPOEX initiative poses additional challenges. COMPOEX is designed to be augmented and reconfigured rapidly to address new problems over a spectrum of changing environments. This requires methods and tools that permit rapid VV&A. A brief description of COMPOEX is provided in Appendix C. Conventional VV&A techniques, while useful, are not adequate for rapid model reconfiguration. The study team’s research revealed that “baseline” VV&A must be performed periodically, during and right after development. These techniques are based mainly on the use of subject 1 matter experts (SMEs) who require time to review models. Note, however, that these SMEs are not always available where and when needed. The study team was challenged to develop highly accelerated VV&A processes for two conditions: compressed and hyper-compressed VV&A. In the compressed scenario, the events triggering the need for the changes allow for several weeks for the model changes. In the hypercompressed scenario, the time available is on the order of several hours. Note, however, that the available time will be employed to perform research, implement code changes, document the changes, and perform VV&A. Thus the time available for VV&A is measured in days for the compressed situation and in tens of minutes for the hyper-compressed situation. Thus, there is a need for a means of rapidly establishing credibility with the users of such models shortly after changing components or configuration. By developing V&V processes that are well suited to that problem, it will ensure that COMPOEX technology will be transitioned to truly useful capability. In addition, it will minimize the possibility that users will unwittingly do harm. These factors require novel thinking and approaches to VV&A. B. BASELINE VV&A As a context for addressing the compressed and hyper-compressed V&V approaches, the study team developed a “baseline” VV&A process and assessment methodology for a PMESII model. This “baseline” VV&A process is based on extensive literature review and prior experiences with PMESII modeling (e.g., DMSO’s Flexible Asymmetric Simulation Technologies [FAST] Tool Box). Furthermore, the “baseline” VV&A process was subjected to extensive peer review, drawing on world-class VV&A experts. In developing the “baseline” VV&A process, the study team built on the latest thinking in VV&A approaches. This included an entrenched process that would be employed throughout the life cycle of the model, the adaptation of a VV&A maturity model that was developed by the Defense Modeling & Simulation Office (DMSO), and the employment of a risk assessment and mitigation process to enhance communications with the user of the model. Furthermore, the study team developed a spreadsheet model that provided a systematic way of organizing V&V tests and capturing their results. These results capture the degree of success achieved in each test. However, with hundreds of tests, covering the many areas of the system, simple combinations of the results tend to obscure, rather than elucidate problems. To enhance the transparency of those tests, the study team linked a set of “spider” diagrams to the spread sheet (see Figure 1). The “spider” diagrams support visualization of multiple dimensions in a single chart and support an overview and segmentation by each individual model. Those diagrams provide the V&V Team with deep insight into the model’s strengths and weaknesses that emerge from the “baseline” VV&A process. 2 System Validation Metrics Model Validation Metrics Political 5.00 Political 5.00 4.00 3.00 Connect Military 2.00 1.00 0.00 User Issues 4.00 3.00 User Issues Economic DIME DIME Military 2.00 1.00 0.00 Economic Social Infrastructure Infrastructure Social Information Information PMESII System Name 1 Inter-Model Connections Model Validation Metrics Political 5.00 User Issues DIME 4.00 3.00 2.00 1.00 0.00 Military Economic Model 1-2 Model 5-6 5.00 4.00 Model 4-6 3.00 2.00 Model 4-5 1.00 0.00 Model 3-6 Model 1-3 Model 3-5 Infrastructure Social Model 1-4 Model 1-5 Model 1-6 Model 2-3 Model 3-4 Model 2-6 Model 2-4 Model 2-5 Information Name 2 PMESII System Figure 1. Illustrative “Spider” Diagrams C. COMPRESSED AND HYPER-COMPRESSED VV&A C.1 Preparation To set the stage for the discussion of compressed/hyper-compressed VV&A, it is useful to introduce a metaphor: the “two-minute drill” in football. All football teams anticipate a situation where they must score when they have only two minutes in which to operate (typically at the end of a game). In order to prepare for this contingency, they perform a complex set of actions long before the game begins. First, they know the conditions that they must contend with (e.g., their adversary; his strengths and weaknesses). Consistent with that knowledge, they formulate a constrained “play book” that reflects the adversary’s vulnerabilities. Second, they decide on the team that will be used to implement the “play book”. Typically, that team will deviate from normal practice by not forming a huddle during that period of time (wherein decisions are made and plays are selected) to take maximum advantage of the limited time. Furthermore, that team practices and refines the “two-minute drill” intensely so that they will be effective and efficient when they have to execute it. Although the “two-minute drill” is a useful conceptual model for planning compressed or hypercompressed V&V, it is still an imperfect analogue. Note that an operational V&V task will occur with limited warning and there will be significant ambiguity about the appropriate size and composition of the V&V Team. In order to perform compressed and/or hyper-compressed VV&A, it is vital that critical steps be taken while performing “baseline” VV&A. Four steps are of particular importance. First, because of the complexity of the task, it is necessary to anticipate likely Areas of 3 Responsibility (AoRs) and take anticipatory steps. These steps would include, inter alia, acquiring minimum essential information, designating the members of the V&V Team, and recruiting and socializing the appropriate SMEs. Second, it is necessary to plan and implement a VV&A process that is consistent with anticipated resource constraints. This requires consideration of the needed people, processes, and tools. Third, it is vital to exercise and evaluate compressed and/or hyper-compressed VV&A processes periodically. Finally, it is important to develop and implement best practices based on results from exercises and experiences with other programs. The “bottom line” is that preparation is vital if compressed and hyper-compressed VV&A are to be performed efficiently and effectively. To deal with the “curses of complexity and time compression”, several automated tools will be needed to support compressed and hyper-compressed V&V. First, as noted above, techniques are needed to identify a small set of impending AoRs of interest. Second, ontologies are needed to support the automated collection and conversion of needed data. Third, automated expert elicitation techniques would be valuable to generate a “SME-in-a-box” and to create a “just-intime” needed model. Finally, techniques are needed to generate meta-model(s) of the multiresolution model to support compressed and hyper-compressed VV&A. This latter issue is discussed below. C.2 Approaches to Compressed and Hyper-Compressed VV&A The authors of this paper developed several possible approaches to the problem of performing useful V&V within the compressed and hyper-compressed scenarios. The team grouped these approaches into three categories, exploratory design, logic tracing, and comparison (Figure 2). The initial approaches were refined, reduced, and combined. Baseline VV&A Exploratory Design Approaches Preset Data & Preset Tests Exploratory Space Analyses Logic Tracing Approaches Risk Analyses “Flight Recorder” Automatic Comparison Approaches Compare with Generic model (e.g.,ISSM) Compare with domain specific model (e.g., Senturion) Compare with SME Figure 2. Approaches to Compressed & Hyper-Compressed VV&A Compressed and hyper-compressed situations occur during use of the model. Time and other resource compression are important factors. Triggers include introduction of a new tool, modifications to an old model, and a changed environment. A changed environment can be a new scenario with new data categories (for example, the real world has changed, as Iraq changed after bombing of the Golden mosque in Samarra). A changed environment can also be a different way of using the model. 4 The three test systems described below are all potentially valuable elements of developmental and periodic testing. However, they may be the only elements that can be used during the compressed or hyper-compressed time lines of triggered testing. The model comparison and embedded results-traceback systems are likely to be the only extra test-oriented elements that will be used during ordinary model use (as opposed to during testing). The model comparison system will support calibrating the PMESII model to maximize confidence in using it to simulate the future and the results-traceback system will be available to examine any unusual results. The experimental design system must be run during the uncompressed time lines of developmental and/or periodic testing to create the pre-set tests and results that are needed for later testing. The model comparison and embedded results-traceback systems do not have this absolute requirement for running during developmental and periodic testing; however, without this use, they are less likely to be available, ready, and supported by trained users during the critical times. Just as with the two-minute drill for the final two minutes of a football game, compressed and hyper-compressed validation techniques must be developed and practiced before those final critical moments. For the selected approaches, we will characterize the system design and system operation. Furthermore, we will provide additional comments on the approaches. C.2.1 Experimental Design System At any stage of testing, there are always more tests that can be thought of than there is time to accomplish. Of particular concern is the concept of region(s) of validity. Because models, by definition, are abstractions, they cannot give exact answers to all possible questions. In general (and especially in the case of PMESII models), they cannot give exact answers to any question, but may give answers that are close enough to be useful. Sometimes the region in which the answers are “close enough” encompasses the entire question space; however, in many instances the region is a subset of the entire space and it is important to understand the limits of the region of validity (and whether it is a connected region or is composed of non-intersecting sub-regions). System Design. Experimental design is a technique for producing a (relatively) small set of experiments (sets of data inputs) for exploring the space and discovering the shape of the region of validity. Even though the set of experiments is relatively small, for PMESII models it will still involve a large number of tests. However, the results of the tests may yield a still smaller set of preset tests that can be used in later tests to determine whether changes in the model or environment (scenario or model use) have caused changes in the region of validity. During system design, the team should define a process for creating the experimental design, analyzing and storing its results, and later using these results. System Operation. Figure 3 illustrates the process. In this figure, “Now” represents the time at which the question of changes arises. If these changes occur during development, there may be time to run the experimental design to determine possible changes in the region of validity. However, at other times the VV&A Team must rely on having run the experimental design in the past and having available the preset inputs and expected outputs to test for changes in validity. Therefore, the system operation has two components, set-up and use. The set-up consists of exploratory analyses and risk analyses that are performed during the model’s stable period. The time that is most likely to be conducive to the set-up is the period following a developmental cycle. Both the exploratory and risk analyses would provide excellent 5 support to the VV&A process that should occur at that time. The purpose of the exploratory analyses is to characterize the total response surface of the model, permitting definition of the “interesting” parts of the surface. The risk analyses define the areas that have the most variability and appear most susceptible to validity problems. Together, these analyses support the definition of a filter that will support selection of a reduced set of tests. Figure 3. Experimental Design Unfortunately, the dimensionality of PMESII problems is too great for simple characterization. During the set-up, an experimental design must be created to support disciplined analyses. Depending on the nature of the model and the resources available, the experimental design may range from millions to tens of tests. It is likely that all but the last will require data farming on high performance computers. The immediate purpose of these tests is to characterize the response surface – which includes evaluation of the validity of the model at each part of the response surface. In cases where data farming is used, this will almost certainly mean that the data farming will merely identify areas that look questionable. Because of constrained resources, only a select few areas can be further tested with conventional V&V techniques. After the experimental design tests have been run through the model and the results have been analyzed, the team must create a reduced set of tests and expected outputs for later use. The filter that is developed should consider the “interesting” parts of the response surface (perhaps where outliers are likely or where the surface bends more sharply than elsewhere) and the riskiest parts of the surface. While the preset tests should concentrate on areas where validity seems most likely to be challenged by any changes to the model, they should also include a few uncontroversial areas to anchor the results. The plan is to run the preset tests following any change to the model and compare the outputs of the modified model with the preset outputs. Naturally, changes to the model should lead to changes in results; however, the output changes should be along expected lines. Excessive differences or differences in unpredicted areas must be investigated. 6 Some of the investigation may be through SME evaluation, which should include rating on a risk scale, with the riskiest being subject to the most rigorous review. Another approach follows the characterization of the response surface. This approach consists of a comparison of the characteristics of the new response surface to the old one. Areas of change would be subjected to the most rigorous review. Naturally, this approach is only feasible in situations where the response surface was previously well-characterized. The details of the use of the preset tests will depend on whether the situation is compressed or hyper-compressed. In compressed situations, the preset tests will be further filtered to stress the riskiest tools, interactions, and data. In hyper-compressed situations, the filter will be for the most important inputs for the riskiest outputs and for the changed structures. Although outliers produced by the preset tests in the modified PMESII model deserve extra attention, the fact that they are outliers does not prove that they are invalid or that the changed model is invalid. The changed model may require new data for variables that were not present in the preset data or they may not use all of the data in the preset data because old variables have been dropped. These changes present problems both in execution and in interpretation. The SME assessments will require time because the humans involved must work at human speeds. Methods to automate portions of this assessment can be major multipliers in increasing the speed and value of the validation process. Comments. The exploratory design system will have to be tailored to the PMESII model or system. Naturally, the time it takes for a single model run – and accompanying V&V analysis will impact the overall system design. The types of inputs will also impact the system design. Standard inputs (e.g., situational state variables and planned actions) are relatively simple to design variations for. However, there are also possible “model definition” inputs (e.g., characteristics that differentiate one city from another or one leader from another) and inputs that define how one module interacts with another. These model definition inputs must be also considered in the experimental design. Both types of input have uncertain values and need to be varied and validated in testing. C.2.2 Embedded Results-Traceback System During the use of the PMESII model, and especially during testing, there will be occasions when the outputs seem wrong. They may in fact be wrong or they may be shedding insight that can be used to correct misunderstandings of the real world. One useful technique for resolving the ambiguity is that of tracing the results back to their origins; however, doing this manually in a complex system can be difficult, if not impossible. System Design. If the PMESII model permits the installation of intermediate data capture instruments, an automated system can be created to help in determining the validity of suspicious results. Figure 4 illustrates such a system. If such a system is created, it will support the validation of results at the time they are discovered (“Now” in the figure). An embedded resultstraceback system can be important in developmental and periodic testing and may be critical in triggered testing and model use evaluation. 7 Figure 4. Embedded Results-Traceback System The embedded data capture instruments illustrated in the figure are designed to capture the transformation of data from inputs to outputs. Ideally, the values for each variable would be captured as it changes along with the time of change. However, the sheer volume of data would make any practical use of such a data capture regime difficult, if not impossible. A more practical regime involves the capture of selected variables that are likely to be important in understanding final output results and only capturing the data at periodic (simulated) intervals to further reduce the volume of data. COMPOEX writes its significant variables (basically those that are passed from one internal model to another) to a “backplane,” or vector of variable values. It also uses a time-stepped architecture, yielding a natural period for data collection. Instrumenting other models may be more difficult, requiring the development of a “flight recorder,” and creating and inserting code blocks in each module to capture and send the variable values to the flight recorder. The second requirement for an embedded traceback system is the need for a database to store and serve up the traceback data as needed. The third requirement is for a program to support the analysis of the traceback data. This program and a simulated display are also illustrated in Figure . Automatic causal tracing would be desirable; however, in PMESII situations, complexity is the rule. Cause and effect relationships are actually causal networks, with multiple causes of multiple effects. The tracing combinatorics require significant amounts of time and preclude much depth of tracing. Fortunately, the pattern recognition abilities of biological beings are available. Human beings 8 can simply look at current and “past” traces of variable values and identify and traceback “humps” and “valleys.” This provides a first order estimate of the source and likely validity of possible anomalies in output results. System Operation. The embedded results-traceback system naturally requires an upfront investment; however, no additional resource drain is absolutely required subsequently. This system should be operational at all times. It will be used at any time that possible anomalies are spotted; however, it should also be exercised periodically to ensure rapid response during critical times. Its use in compressed and hyper-compressed situations to check the validity of modified versions of the model clearly depends on the time available. In compressed situations, all major results should be reviewed to investigate their causal networks. In addition, several minor results should be similarly investigated, as spot checks. In hyper-compressed situations, any major results that look questionable should be investigated. In either case, if nothing looks out of line, then the user would assume that the validity level is no worse than it was before the changes. Comments. Because of the architecture of COMPOEX, the creation of an embedded resultstraceback system is relatively simple. In systems that require the building of a “flight recorder” and internal code blocks, this creation is more difficult. It requires extra time and effort. It also introduces new sources for errors. In either case, organizing the traces in meaningful ways will be difficult. C.2.3 Model Comparison System The model comparison system produces validation evidence by comparing the outputs of the PMESII model to other reasonable sources. Three sources are considered: real world results, opinion from area experts, and a much simpler, fast running model that uses different logic. This first source would be considered sufficient in itself except for the belief that real world results from a given starting point are not deterministic in the sense that we are confident that no other result could have occurred. It may be that the real world is truly stochastic or that we simply cannot observe enough of it at sufficient levels of granularity to understand how the results are determined. Thus divergent results from a model may not be an indication of lack of validity, but rather an indication of alternate results that could occur. The opinions of true area experts might well be the best source; however, this source has its own problems. First, the experts might not be universally available. Second, the determination of the level of expertise of putative experts is problematic. Third, creating expert opinions on future results by humans is a slow process. Fourth, for practical reasons, the experts should be required to make their predictions prior to the actual unfolding of events. The last source, a simpler model, has its own pros and cons. The validity of that model is almost certainly inadequately known. Despite this indeterminate validity, a model that has some credibility and uses entirely different logic from the model in question can provide independent support for the PMESII model by providing similar results for similar starting situations. Alternatively, by disagreeing with the PMESII model, the simpler model can suggest a need for further investigation and provide some diagnostic information concerning areas of maximal disagreement and maximal agreement. The model comparison system also provides a calibration mechanism, as explained below. 9 System Design. Figure 5 uses an icon for the Interim Semi-Static Stability Model (ISSM) to represent the simpler model. The ISSM is used because it is known to have the needed attributes for the task; however, there are other possibilities (e.g., Senturion from Sentia Group, Inc.). Figure 5. Model Comparisons The system for running the simpler model with the same data and with comparable outputs must be designed and set up prior to V&V testing or model use. For maximal ease of use, the data for the simpler model should be automatically created from the data needed to run the PMESII system in question. As shown in the figure, the simpler model will also need data feeds of any inputs that are exogenous to the PMESII system that are introduced during the course of simulated time within the system (such as interventions and externally modified state variables). The simpler model should also produce outputs that are as closely comparable to the PMESII system outputs as possible. System Operation. In Figure 5, “Now” represents the time when the PMESII system is to be tested or used. The events shown as being in the past, unlike those shown in the past in Figure are not things that need to have been done by the testing team prior to “now.” The past in Figure represents real world past events that should be simulated prior to attempting to simulate the future. The concept is to start the simulation at some time in the past, simultaneously keep track of events with another, simpler model, and compare the results of the PMESII simulation, the simpler model, SME projected results, and the current situation. Rough agreement should be considered a successful validation event. If the PMESII simulation did not achieve comparable results, rerun the simulation with changed parameters. Continue adjusting the parameters until the simulation is calibrated to achieve comparable results, only then proceeding with the simulation of the future events. If the parameters cannot be adjusted to achieve comparable results, this would be a serious non10 validation event. The determination of the comparability of the results will need to be made by SMEs. Some of the results will not be easily quantifiable (especially real world status), the significance of the differences in quantified variables will not be clear, and the possibility of alternative futures will need to be considered. Comments. A model comparison system can be important in developmental and periodic testing and may be critical in triggered testing. It may be the best means of creating confidence in the model during use through the use of known calibration points. C.3 Comments on the Novel Validation Approaches The exploratory design approach may require extremes in computing power in situations where data farming methods are required. It should be adequate in compressed situations; however, it may be too slow for use in hyper-compressed situations. Its use during baseline VV&A will provide useful information, both for the baseline VV&A and for later compressed and (possibly for) hyper-compressed situations. The embedded results-traceback approach should be usable in compressed and hypercompressed situations. Its use during baseline VV&A will provide useful information, both for the baseline VV&A and for later compressed and hyper-compressed situations. Further, the embedded nature of this approach means that it can be used at any time during the use of the PMESII system to investigate any questionable results. The model comparison approach should be usable in compressed and hyper-compressed situations. Its use with projected SME results during hyper-compressed situations may not be possible because of time constraints. Its use during baseline VV&A will provide useful information, both for the baseline VV&A and for later compressed and hyper-compressed situations. Further, the embedded nature of this approach means that it can be used at all times during the use of the PMESII system to produce calibrated results. This approach is known to be feasible using the ISSM as the simpler model. Using combinations of these approaches would be beneficial in all situations: baseline, compressed, and hyper-compressed. The approaches to the problem of validating a modified model are orthogonal and thus likely to yield the most information with the least effort. 11 D. SUMMARY The study team has achieved the following major accomplishments during the course of this project. First, it has developed a peer-reviewed, “baseline” VV&A methodology for a PMESII multiple resolution model. That methodology draws on state-of-the-art thinking and includes novel tools to implement it. Second, the project has served to clarify the linkage between the “baseline” VV&A methodology and the proposed compressed and hyper-compressed approaches to V&V (i.e., the “two minute drill”). In particular, it has stressed the importance of taking anticipatory steps during the “baseline” process, planning a compressed and hyper-compressed V&V process that is consistent with time and resource constraints, exercising and evaluating the compressed and hypercompressed approaches systematically, and developing and implementing “best practices” to apply to the process. Within that context, the study team has developed three approaches that are suitable for compressed and hyper-compressed VV&A: exploratory design, logic tracing, and comparison with High Level Models. However, there are key major residual challenges that should be addressed in follow-on research. 12 APPENDIX A: REFERENCES A.1 PRESCRIPTIONS AND THEORETICAL DESCRIPTIONS OF VV&A (SORTED BY DATE) 1. Naylor, Thomas H. and J. M. Finger. "Verification of Computer Simulation Models." Management Science 14, 2 (Oct 1967): B92-B106. 2. McKenney, James L. "Critique of: 'Verification of Computer Simulation Models'." Management Science 14, 2 (Oct 1967): B102-B103. 3. Schrank, William E. and Charles C. Holt. "Critique of 'Verification of Computer Simulation Models'." Management Science 14, 2 (Oct 1967): B104-B106. 4. Kleijnen, Jack P. C. Statistical Techniques in Simulation Part I. New York: Marcel Dekker, Inc., 1974. 5. Kapper, Francis B. "Wargaming I: The Simulation of Crisis." Defense/81. Arlington, VA: American Forces Information Service, May 1981, 14-21. 6. Hoeber, Francis P. Military Applications of Modeling: Selected Case Studies, s. New York: Gordon and Breach Science Publisher, 1981. 7. Law, Averill M. and W. David Kelton. Simulation Modeling and Analysis. New York: McGraw-Hill Book Company, 1982. 8. Balci, Osman. "Credibility Assessment of Simulation Results." Proceedings of the 1986 Winter Simulation Conference. Ed. J Wilson J Henriksen S Roberts. 1986, 38-44. 9. Schmidt, J. W. "Introduction to Systems Analysis, Modeling and Simulation." Proceedings of the 1986 Winter Simulation Conference. Ed. J Wilson J Henriksen S Roberts. 1986, 5-16. 10. Sargent, Robert G. "Overview of Verification and Validation of Simulation Models, An." Proceedings of the 1987 Winter Simulation Conference. Ed. A Thesen; H Grant; W D Kelton. Atlanta, GA, 1987, 33-39. 11. Balci, Osman. "How to Assess the Acceptability and Credibility of Simulation Results." Proceedings of the 1989 Winter Simulation Conference. Ed. E MacNair, K Musselman, P Heidelberger. Washington, DC, 1989, 62-71. 12. Williams, Marion L., and James Sikora (1991), "SIMVAL Minisymposium-A Report," Phalanx, Vol. 24, No. 2, June. 13. Hodges, James S. (1991). Six (or So) Things You Can Do with a Bad Model. RAND, Santa Monica, CA. 14. Davis, Paul K. 1992. Generalizing Concepts and Methods of Verification, Validation, and Accreditation (VV&A) for Military Simulations. RAND, R-4249-ACQ. 15. Ritchie, Adelia E. (1992). Simulation Validation Workshop Proceedings (SIMVAL II). MORS, Alexandria, VA. 16. Knepell, Peter L. and Deborah C. Arangno (1993). Simulation Validation: A Confidence Assessment Methodology. IEEE Computer Society Press, Los Alamitos, CA. 17. Anderson, Robert H. and Paul Davis (1993). Proceedings of a Workshop on Methods and Tools for Verification, Validation, and Accreditation October 5-6, 1993. RAND, PM179-DMSO. 18. Hartley, Dean S., III. and Howard Whitley (1996). “Verification, Validation, and Certification (VV&C).” Simulation Data and Its Management Mini-Symposium (SIMDATAM ’95). Ed. C E Gettig Jr. and Julian I Palmore. MORS, Alexandria, VA, 1996. 19. Balci, Osman. "Verification, Validation and Accreditation of Simulation Models." 13 Proceedings of the 1997 Winter Simulation Conference. Ed. S Andradottir, K J Healy, D H Withers, and B L Nelson. Washington, DC, 1997, 135-141. 20. Hartley, Dean S., III. "Verification & Validation in Military Simulations." Proceedings of the 1997 Winter Simulation Conference. Ed. S Andradottir, K J Healy, D H Withers, and B L Nelson. Washington, DC, 1997, 925-932. 21. Verification, Validation, and Accreditation of Army Models and Simulations (1999). DA PAM 5-11, Washington, DC. 22. Verification, Validation, and Accreditation (VV&A) of Models and Simulations (1999). SECNAV Instruction 5200.40, Washington, DC. 23. Conwell, Candace L., et al. "Capability Maturity Models Support of Modeling and Simulation Verification, Validation, and Accreditation." Proceedings of the 2000 Winter Simulation Conference. Ed. J A Joines, R R Barton, K Kang, and P A Fishwick. Washington, DC, 2000, 819-828. 24. Balci, Osman. “Verification, validation and testing of models,” Encyclopedia of Operations Research and Management Science. Ed. Saul I. Gass and Carl M. Harris. Kluwer Academic Publishers, Boston, MA. 2001. 25. Balci, Osman, et al. "A Collaborative Evaluation Environment for Credibility Assessment of Modeling and Simulation Applications." Proceedings of the 2002 Winter Simulation Conference. Ed. E Yucesan, C-H Chen, J L Snowdon, and J M Charnes. Washington, DC, 2002, 214-220. 26. Youngblood, Simone (2004), “DoDI 5000.61 and the VV&A RPG”, DMSO, Washington, DC. 27. Sargent, Robert G. "Validation and Verification of Simulation Models." Proceedings of the 2004 Winter Simulation Conference. Ed. R G Ingalls, M D Rossetti, J S Smith, and BA Peters. Washington, DC, 2004. 28. Modeling and Simulation Verification, Validation, and Accreditation Implementation Handbook: Volume I VV&A Framework (2004). Department of the Navy, Washington, DC. 29. Simulation Verification, Validation and Accreditation Guide (2005). Australian Defence Simulation Office, Department of Defence, Canberra, Australia. 30. Youngblood, Simone (2005), “VV&A Standards Initiative”, DMSO, Washington, DC. 31. Management of Army Models and Simulations (2006). DA Regulation 5-11, Washington, DC. 32. Waltz, Ed (2007). “Integrated Battle Command Program: Model Validation.” BAE Systems. 33. Draft ATEC Reg 73-21 (Accreditation of M&S for Test and Evaluation). 34. DMSO VV&A website, http://vva.dmso.mil/. 35. “Agent Based Simulation (ABS) VV&A Workshop,” sponsored by Mike Bailey, USMC, held at Northrop Grumman, Fair Lakes, VA, 1-2 May 2007. A.2 PRACTICE OF VV&A (SORTED BY DATE) 36. Hartley, Dean S., III (1975), An Examination of a Distribution of TAC CONTENDER Solutions, NMCSSC, Washington, DC, TM 101-75. 37. Vector Research (1981), Verification Analysis of VECTOR-2 with the 1973 Arab-Israeli War and Analysis of Related Military Balance Issues, VRI-NGIC-1 FR 81-1. VRI., Ann Arbor, MI. 38. Hartley, Dean S., III, et al. (1989), Sensitivity Analysis of the Joint Theater Level 14 Simulation I, Martin Marietta Energy Systems, Inc., Oak Ridge, TN, K/DSRD-70. 39. Hartley, Dean S., III, John D. Quillinan, and Kara L. Kruse (1990), Phase I Verification and Validation of SIMNET-T, Martin Marietta Energy Systems, Inc., Oak Ridge, TN, K/DSRD-116. 40. Hartley, Dean S., III, Charles Radford, and Cathrine E. Snyder (1991), Evaluation of the Advanced Battle Simulation in WAREX 3-90, Martin Marietta Energy Systems, Inc., Oak Ridge, TN, K/DSRD-597. 41. Hartley, Dean S., III (1991), Confirming the Lanchestrian Linear-Logarithmic Model of Attrition, Martin Marietta Energy Systems, Inc., Oak Ridge, TN, K/DSRD-263/R2. 42. Hartley, Dean S., III, et al. (1994), An Independent Verification And Validation of The Future Theater Level Model Conceptual Model, Martin Marietta Energy Systems, Inc., Oak Ridge, TN, K/DSRD-1637. 43. Bauman, Walter (1995), Ardennes Campaign Simulation (ARCAS), CAA-SR-95-8. US Army Concepts Analysis Agency, Bethesda, MD. 44. Metz, Michael L. (2000), "Joint Warfare System (JWARS) Verification and Validation Lessons Learned." Proceedings of the 2000 Winter Simulation Conference. Ed. J A Joines, R R Barton, K Kang, and P A Fishwick. Washington, DC, 2000, 855-858. 45. Hartley, Dean S., III. (2001), Predicting Combat Effects, INFORMS, Linthicum, MD. 46. Wait, Patience (2001). “Project Protection Comes at a Price.” Washington Technology, 8/28/01. http://www.washingtontechnology.com/news/1_1/daily_news/17071-1.html. 47. Chaturvedi, Alok R. (2003). “SEAS-UV 2004: The Theoretical Foundation of a Comprehensive PMESII/DIME Agent-Based Synthetic Environment,” Simulex, Inc. 48. Hartley, Dean S., III (2004). FAST for the Warfighter: Test Strategy & Plan (Revision 3). DRC, Vienna, VA. 49. Hartley, Dean S., III (2005). MOOTW FAST Prototype Toolbox: FY04 Validation Strategy & Plan. DRC, Vienna, VA. 50. Hartley, Dean S., III (2005). MOOTW FAST Prototype Toolbox: FY05 Validation Strategy & Plan. DRC, Vienna, VA. 51. Senko, Robert M. (2005). Flexible Asymmetric Simulation Technologies (FAST) for the Warfighter: FY05 Final Verification Test Report. DRC, Vienna, VA. 52. COAST GUARD: Changes to Deepwater Plan Appear Sound, and Program Management Has Improved, but Continued Monitoring Is Warranted (2006). GAO, Washington, DC. 53. McDonald, C. S. (2006). Verification and Validation Report for the Synthetic Environment for Analysis and Simulation (SEAS). The Johns Hopkins University – Applied Physics Laboratory, Laurel, MD. 54. Sheldon, Bob (2006). “Memorandum for the Record: V&V Report for the Synthetic Environment for Analysis and Simulation (SEAS), 27 October 2006.” 55. Davis, D. F. (2006). “Consolidated Report Consisting of three Research and Development Project Summary Reports,” Contract #W15P7T-06-T-P238. Peace Operations Policy Program, George Mason University, Arlington, VA. A.3 PMESII CONSIDERATIONS 56. Hartley, Dean S., III (1996). Operations Other Than War: Requirements for Analysis Tools Research Report. Lockheed Martin Energy Systems, Inc., Oak Ridge, TN, K/DSRD-2098. 57. Hayes, Bradd C. and Jeffrey I. Sands, 1997. Doing Windows: Non-Traditional Military Responses to Complex Emergencies, CCRP, Washington, DC. 15 58. Staniec, Cyrus and Dean S. Hartley III. (1999), OOTW Analysis and Modeling Techniques (OOTWAMT) Workshop Proceedings, January 1997. MORS, Alexandria, VA. 59. Hartley, Dean S., III (2001). OOTW Toolbox Specification. Hartley Consulting, Oak Ridge, TN. 60. DMSO (2001). “Human Behavior Representation (HBR) Literature Review,” RPG Reference Document, http://vva.dmso.mil/. 61. DMSO (2001). “Validation of Human Behavior Representations,” RPG Reference Document, http://vva.dmso.mil/. 62. Allen, John G. (2006). “Modeling PMESII Systems.” DARPA, Washington, DC. 63. Hartley, Dean S. III (2006). Operations Other Than War (MOOTW) Flexible Asymmetric Simulation Technologies (FAST) Prototype Toolbox: ISSM v4.00 Analysts' Guide, Dynamics Research Corporation. 2006. 64. Wagenhals, Lee W. and Alexander H. Levis (2006), “Dynamic Cultural Preparation of the Environment: Course of Action Analysis Using Timed Influence Nets”, George Mason University, Fairfax, VA. 65. HQ DA, “Systems Perspective of the Operational Environment,” The Operations Process, FM 5-0.1, 2006. 66. Hartley, Dean S., III, Dave Holdsworth, and Christian Farrell (2006). OOTW FAST Prototype Toolbox: Analysis Process. DRC, Orlando, FL. A.4 SIMULATION TECHNOLOGY 67. Starr, Stuart (2000). SIMTECH 2007 Mini-Symposium and Workshop. MORS, Alexandria, VA. 68. Davis, Paul and Robert H. Anderson (2003). Improving the Composability of Department of Defense Models and Simulations. RAND, Santa Monica, CA. 16 APPENDIX B. ABBREVIATIONS AND ACRONYMS Acronym AoR COMPOEX DARPA DIME DMSO FAST ISSM M&S MoM PMESII RoI SME V&V VV&A Definition/Description Area of Responsibility Conflict Modeling, Planning and Outcomes Experimentation Defense Advanced Research Projects Agency Diplomatic, Informational, Military, Economic Defense Modeling and Simulation Office (now M&S CO) Flexible Asymmetric Simulation Technologies Interim Semi-Static Stability Model Modeling & Simultion Measure of Merit Political, Military, Economic, Social, Information and Infrastructure Return on Investment Subject Matter Expert Verification & Validation Verification, Validation & Accreditation 17 Appendix C. Major Characteristics of the COMPOEX Multi-Resolution Model DARPA’s COMPOEX program is developing a decision aid to support leaders in designing and conducting future coalition-oriented, multi-agency, intervention campaigns employing unified actions, or a whole of government approach to operations. It generates a distribution of “plausible outcomes” rather than precise predictions. It provides a comprehensive family of interacting models that span the relevant DIME/PMESII domain. The execution of COMPOEX: Automatically forces models to interact to suggest plausible activities and outcomes; Allows the user to modify or create models and to import models; Allows multi-sided analysis; and Allows the user to visualize the interactions (normally in the form of graphical data). COMPOEX’s components include: Conflict Space Tool: this provides leaders and staff the ability to explore and map sources of instability, relationships, and centers of power to develop their theory of conflict. Campaign Planning Tool: a framework to develop, visualize, and manage a comprehensive campaign plan in a complex environment. Family of Models: these are instantiated for the current problem and additional models to more accurately represent the operational environment. Option Exploration Tool: this enables a staff to explore multiple series of actions in different environments to see the range of possible outcomes in all environments. Figure C-1 illustrates the structure of the COMPOEX Option Exploration Tool. Each PMESII area is represented by a set of models, divided by level of resolution. In general, the models communicate with each other by posting variable results to the backplane at the end of each time step and reading variable values at the beginning of the next time step. The models are tailored to the Area of Responsibility (AOR) and, perhaps, to the problem at hand. 18 Levels of Resolution Political-Social Economic PS1 Nat’l Gov’t and Regional Influences National Sector ROL Infrastructure Information Military E1 Nat’l Econ PS2 Sector Area Power Structures Media STORM JCAT Aggregation Districts PS3 E2 Meso Econ E3 Employment E4 Black Market HS Health Sanitatn EP POL Food Distr Interaction MIL Backplane Figure C-1. COMPOEX Structure From a V&V perspective, each model must be addressed and the nature of the assumptions built into the communications links must be addressed as part of the V&V of the entire system. The fact that multiple simulation paradigms (systems dynamics models, SOAR technology, and possible other agent-based modeling systems) are used within the system complicates the process. However, the most challenging aspect from a VV&A perspective is the proposed use of different sets of models, depending on the situation, with models being custom built or modified immediately prior to use. 19