Case for Support Title: Automating Simulation Output Analysis (AutoSimOA): Selection of Warm-up, Replications and Run-Length Part 1: Previous Research Track Record Professor Stewart Robinson (Operational Research and Information Systems Group, Warwick Business School, Coventry) has spent nearly 20 years working in the field of computer simulation, initially as a consultant. His research focuses on the practice of simulation [Robinson, 2004] with particular projects investigating: The verification and validation of simulation models and the quality of simulation projects [Robinson and Pidd, 1998; Robinson, 1999; Robinson, 2002a]. Modes of simulation practice: how simulation modellers approach simulation studies [Robinson, 2001; Robinson, 2002b] Modelling human decision-making with simulation: using simulation to elicit knowledge from decisionmakers and using artificial intelligence to represent their decision-making strategies in a simulation [Edwards et al, 2004; Robinson et al, 2005]. This includes two projects funded through the EPSRC Innovative Manufacturing Initiative and Innovative Manufacturing Research Centres. Professor Robinson is co-chair of the Operational Research Society Simulation Study Group and founder and cochair of the biennial Operational Research Society Simulation Workshop. These have become a focus for the discrete-event simulation community in the UK and further afield. At Warwick he leads the simulation research group which currently consists of 4 members of staff and 4 PhD students. Professor Robinson has made specific contributions to the field of simulation output analysis: 1. 2. 3. Development of a heuristic technique for determining the run-length of a non-terminating steady-state simulation [Robinson, 1995]. Applying a heuristic method for detecting shifts in the mean of a time-series of simulation output. This enables a modeller to identify shifts in the steady-state of a system caused by changes in the input data or random shifts in the model performance [Robinson et al, 2002]. Development of a method for detecting initialisation bias and selecting a warm-up period for a simulation. This method is based on the principles of statistical process control [Robinson, 2002c]. In the past year, three masters projects have been supervised which performed some preliminary investigations into the automation of simulation warm-up, replications and run-length decisions respectively. These projects demonstrated the possibility of automating simulation output analysis, although it also showed the need for testing and further development of the output analysis methods used. Professor Ruth Davies (Operational Research and Information Systems Group, Warwick Business School, Coventry) has worked with simulation for many years, particularly with applications in the health service. She has recently moved from the Management School at Southampton University and is head of the Operational Research and Information Systems group in Warwick Business School. Discrete-Event Simulation and POST: Professor Davies developed a discrete-event simulation approach [Davies et al, 1993] providing a powerful and highly flexible simulation shell, enabling modelled patients to participate in simultaneous activities and queues and to be retrieved efficiently. It was subsequently called POST (patient-oriented simulation technique) which she used in a number of simulation projects funded by the Department of Health, to help determine policy and set budgets. The projects included a projection of the resource requirements of end-stage renal failure [Roderick et al, 2004], the evaluation of potential screening programmes for diabetic retinopathy [Davies et al, 2002a], and the determination of the cost-effectiveness of screening for helicobacter pylori [Davies et al, 2002b]. The most recent funded project was a simulation of the treatment and prevention of heart disease [Cooper et al, 2002]. Currently, together with her PhD student, Suchi Patel, she is looking at the allocation, matching and survival of patients with liver transplantation. Modelling Methods: Professor Davies has looked at methods of reducing the simulation run times, including queue management and the use of common random numbers [Davies and Cooper, 2002]. She has published several papers which evaluate various modelling methods for the purpose of health services policy making, the most recent one with Roderick and Raferty [Davies et al, 2003]. In October 2004 she was privileged to be a participant in a CHEBS (Centre for Health Economics and Baysian Statistics) focus fortnight on "Patient simulation modelling in health care". Dr Mark Elder FORS (CEO SIMUL8 Corporation, Glasgow) has been working in the simulation field for more than 25 years. He was part of the pioneering team in the UK based academic/industry collaboration that created the first visual interactive simulation methods [Fiddy et al, 1982]. In the 1990's Dr Elder spent some years teaching and researching simulation at Strathclyde University before returning to industry to found SIMUL8 Corporation. SIMUL8 now has a base in both the US and UK and is the provider of simulation technology to the world’s largest companies. Its products are provided in formats from low level development tools right up to very specific high value simulation based solutions. One area that is a key distinguishing feature of SIMUL8 Corporation's work is that all its products are aimed at use by non-statistically-trained users. These people are typically working to improve the processes they work in. As a result their work is important to the efficiency of UK PLC, but they do not have the necessary statistical knowledge to ensure the work is valid. For this reason SIMUL8 Corporation is very keen to work with Professor Robinson and Professor Davies on this project. References Cooper, K., Davies, R., Roderick, P. and Chase, D. (2002). The Development of a Simulation for the Treatment of Coronary Artery Disease. Health Care Management Science, 5, pp. 259-267. Davies, R. and Cooper, K. (2002). Reducing the Computation Time of Discrete Event Simulation Models of Population Flows. Proceedings of the OR Society Simulation Workshop, 20-21 March 2002, The Operational Research Society, pp. 63-68. Davies, R., Cooper, K., Roderick, P. and Raftery J. (2003). The Evaluation of Disease Prevention and Treatment using Simulation Models. European Journal of OR., 150 (1), pp. 53-66. Davies , R., O'Keefe, R. and Davies, H. (1993). Simplifying the Modelling of Multiple Queuing, Multiple Activities and Interruptions, a Low Level Approach. ACM Transactions on Modelling and Computer Simulation, 3 (4), pp. 332-346. Davies, R., Roderick, P., Brailsford, S.C. and Canning, C. (2002a). The Use of Simulation to Evaluate Screening Policies for Diabetic Retinopathy. Diabetic Medicine, 19 (9), pp. 763-771. Davies, R., Roderick, P., Crabbe, D. and Raftery J. (2002b). Economic Evaluation of Screening for Helicobacter Pylori for the Prevention of Peptic Ulcers and Gastric Cancers, using Simulation. Health Care Management Science, 5, pp. 249-258. Edwards, J.S., Alifantis, A., Hurrion, R.D., Ladbrook, J., Robinson, S. and Waller, T. (2004). Using a Simulation Model for Knowledge Elicitation and Knowledge Management. Simulation Modelling Practice and Theory, 12 (7-8), pp. 527-540. Fiddy, E., Bright, J.G. and Elder, M.D. (1982). Problem Solving by Pictures. Proceedings of the Institute of Mechanical Engineers, pp. 125-138. Robinson, S. (1995). An Heuristic Technique for Selecting the Run-Length of Non-Terminating Steady-State Simulations. Simulation 65 (3), pp. 170-179. Robinson, S. (1999). Simulation Verification, Validation and Confidence: A Tutorial. Transactions of the Society for Computer Simulation International, 16 (2), pp. 63-69. Robinson, S. (2001). Soft with a Hard Centre: Discrete-Event Simulation in Facilitation. Journal of the Operational Research Society, 52 (8), pp. 905-915. Robinson, S. (2002a). General Concepts of Quality for Discrete-Event Simulation. European Journal of Operational Research, 138 (1), pp. 103-117. Robinson, S. (2002b). Modes of Simulation Practice: Approaches to Business and Military Simulation. Simulation Practice and Theory, 10, pp. 513-523. Robinson, S. (2002c). A Statistical Process Control Approach for Estimating the Warm-up Period. Proceeding of the 2002 Winter Simulation Conference (Yücesan, E., Chen, C-H., Snowden, S.L. and Charnes, J.M., eds.). IEEE, Piscataway, NJ, pp. 439-446. Robinson, S. (2004). Simulation: The Practice of Model Development and Use. Wiley, Chichester, UK. Robinson, S., Alifantis, T., Edwards, J.S., Ladbrook, J. and Waller, T. (2005). Knowledge Based Improvement: Simulation and Artificial Intelligence for Identifying and Improving Human Decision-Making in an Operations System. Journal of the Operational Research Society. Forthcoming. Robinson, S., Brooks, R.J. and Lewis, C.D. (2002) Detecting Shifts in the Mean of a Simulation Output Process. Journal of the Operational Research Society, 53 (5), pp.559-573. Robinson, S. and Pidd, M. (1998). Provider and Customer Expectations of Successful Simulation Projects. Journal of the Operational Research Society, 49 (3), pp. 200-209. Roderick P., Davies, R., Jones, C., Feest, T., Smith, S. and Farrington, K. (2004). Simulation Model of Renal Replacement Therapy: Predicting Future Demand in England. Nephrology, Dialysis and Transplantation, 19, pp. 692-701. Part 2: Description of the Research and its Context Background Discrete-event simulation has been used for commercial applications since the early 1960s [Tocher, 1963]. In the early years models were developed using programming languages, but during the 1960s specialist simulation software started to become available (e.g. GPSS [Schriber, 1974]). The late 1970s saw the introduction of commercial visual interactive simulation software [Fiddy et al, 1982] making simulation more accessible to the end customer through a visual and interactive interface. Meanwhile, the visual interactive modelling systems [Pidd, 2004], first seen in the late 1980s, placed simulation model development into the hands of non-experts by removing the need for a detailed knowledge of programming code. Today discrete-event simulation is in widespread use being applied in areas such as manufacturing design and control, service system management (e.g. call centres), business process design and management, and health applications. Organisations benefit from improved performance, cost reduction, reduced risk of investment and greater understanding of their operations. While a welcome development, the prevalence of simulation software and its adoption by non-experts has almost certainly lead to a significant problem with the use of the simulation models that are being developed. The appropriate analysis of simulation output requires specific skills in statistics that many non-experts do not possess. Decisions need to be made about initial transient problems, the length of a simulation run, the number of independent replications that need to be performed and the selection of scenarios [Law and Kelton, 2000; Robinson, 2004]. Appropriate methods also need to be adopted for reporting, comparing and ranking results. The majority of simulation packages only provide guidance over the selection of scenarios through simulation 'optimisers' [Law and McComas, 2002]. Other decisions are left to the user with little or no help from the software. As a result, it is likely that many simulation models are being used poorly. Indeed, Hollocks [2001] in a survey of simulation users provides evidence to support the view that simulations are not well used. The consequences are that incorrect conclusions might be drawn, at best causing organisations to forfeit the benefits that could be obtained and at worst leading to significant losses with decisions being made based on faulty information. Alongside developments in simulation software and simulation practice, theoretical developments in the field of simulation output analysis have continued. Many of these developments are reported at the annual Winter Simulation Conference, which has a stream dedicated to the subject [e.g. Chick et al, 2003]. The focus of the work reported, however, is largely on theoretical developments rather than practical application. For instance, a survey of research into the initial transient problem and methods for selecting a warm-up period found some 26 methods [Robinson, 2002]. None of the methods, with the possible exception of Welch's method [Welch, 1983] appear to be in common use. Three problems seem to inhibit the use of output analysis methods: Most methods have been subject to only limited testing giving little certainty as to their generality and effectiveness Many of the methods require a detailed knowledge of statistics and so are difficult to use, especially for nonexpert simulation users Simulation software do not generally provide implementations of the methods One solution to these problems is to implement an automated output analysis procedure in the simulation software. This would overcome the problem of the need for statistical skills. However, before this is possible more rigorous testing of the output analysis methods is required so the most effective methods can be selected for implementation. Automation might involve full automation, giving the user the 'answer', or partial automation, providing guidance on interpretation to the user. To achieve this, candidate methods may need some refinement to assure their effectiveness across a wide range of applications. In other cases, the need to automate a method will require some revisions to that method. For instance, Welch's warm-up period approach requires a smooth and flat line for a moving average. At present this is left to a user's interpretation. Measures of smoothness and flatness could be added to the procedure in order to automate the approach. Programme and Methodology Objectives The purpose of this research is to explore the potential to automate the analysis of simulation output. The work focuses on three specific areas: selecting a warm-up period, determining the number of replications and selecting an appropriate run-length. The aim is to improve the use of simulation models, particularly by non-experts, by ensuring that they obtain accurate estimates of model performance. It is envisaged that experts will also benefit from the work by providing rapid access to the selected output analysis methods. The specific objectives of the project are: To determine the most appropriate methods for automating simulation output analysis To determine the effectiveness of the analysis methods To revise the methods where necessary in order to improve their effectiveness and capacity for automation To propose a procedure for automated output analysis of warm-up, replications and run-length As such the research focuses on automating the analysis of output from a single scenario. Automating the analysis of the output from multiple scenarios is outside the scope of this project. Figure 1 Automated Simulation Output Analyser Simulation model Output data Obtain more output data Analyser Warm-up analysis Use replications or long-run? Replications analysis Run-length analysis Recommendation possible? Recommendation Methodology Figure 1 outlines the nature of the automated analysis procedure. The 'analyser' consists of three main components: warm-up analysis, replications analysis and run-length analysis. The choice of the latter two depends on the nature of the model and the requirements of the user for a long run or multiple replications. A simulation model is developed using a commercial simulation software package such as SIMUL8. The analyser automatically runs and takes output data from the simulation model. It then performs an analysis of the data with a view to recommending a suitable warm-up period, number of replications and run-length. If insufficient data are available to make a recommendation (e.g. the model has not reached a steady-state) more model runs are performed until sufficient data are available. The research project focuses on the analysis methods required for use within such an analyser. Many methods have been proposed for determining the warm-up period [Robinson, 2002]. Similarly, a range of methods exist for selecting the run-length of a simulation model. Most of these aim to provide a confidence interval estimate of specific precision using, for instance, the batch-means approach (see Alexopoulos and Seila [1998] for a useful review), although some heuristic methods exist (e.g. Robinson [1995]). The one area where little investigation is required is in the selection of the number of replications where standard confidence interval methods can be applied. In this case, there may be some benefits in adopting variance reduction approaches, particularly antithetic variates [Law and Kelton, 2000]. The research project will consist of three stages. In the first stage various methods that have currently been proposed for warm-up and run-length selection will be tested. This will involve the use of artificial data sets such as those proposed by Cash et al [1992] and Goldsman et al [1994]. These have the advantage of being able to generate quite different output data time-series for which the results are known. Further to this, the methods will be tested on output data from real models provided by SIMUL8 Corporation and their users. The tests will determine the effectiveness of the methods with respect to criteria such as their accuracy, simplicity, ease of implementation, assumptions and generality in their use, and estimation of the parameters. Based on the findings of these tests, candidate methods for automation will be selected. The second stage of the research will involve some development and revision of the candidate methods. It is envisaged that the need for revision will come from two sources: During testing ideas for improving the methods will be identified. These improvements will be identified in relation to the criteria used for testing a method’s effectiveness. Because some of the methods require user intervention, for instance, inspection of a time-series, it is not possible to fully automate them. Methods for replacing the user intervention, for example, measures of smoothness for a time-series, will be sought. As part of this stage users of the SIMUL8 software will test the proposed methods using their own models. This will help to evaluate the simplicity and generality in use of the methods. In the final stage of the project an automated procedure will be proposed. It is intended that this will be a revision of the procedure outlined in figure 1, with specific methods identified for warm-up, replications and run-length. Prototype software that interfaces with SIMUL8 will be developed and tested in order to validate the automated procedure. Again, the procedure will be tested with users of the SIMUL8 software. Justification of the Methodology The methodology provides a basis for identifying output analysis methods that are suitable for automation and for improving those methods where necessary. It achieves this by: Providing a thorough investigation of current output analysis methods for warm-up, replications and runlength. This is something that at present does not exist. Using a set of criteria for investigating and comparing the methods. Identifying improvements, particularly in relation to automation, for selected methods. Proposing, developing and testing (with real simulation users) a prototype automated warm-up, replications and run-length procedure that works with a commercial simulation software package. Timeliness and Novelty In surveys of simulation users both Hlupic [1999] and Hollocks [2001] identify better experimental support as being much needed. Hollocks notes that ‘simulation is increasingly in the hands of non-specialists, so the quality of experimentation is at higher risk’. Despite many advances in the simulation field, support for simulation users during experimentation remains largely underdeveloped. At the same time many methods for warm-up, replications and run-length have been proposed. These need to be tested more fully to determine which have greatest potential for helping simulation users. Once appropriate methods have been identified, the capability and speed of modern hardware and software make automated analysis a realistic possibility, even with a high demand on computing power. The novelty of the work lies primarily in it addressing the need to test warm-up, replication and run-length methods; the adaptation of methods where necessary; and the specification of an automated procedure that currently does not exist. Programme of Work An outline of the programme of work is provided in the appendix. The key milestones in the project are as follows: Milestone Literature review of warm-up, replications and run-length methods * Development of artificial data sets and collection of simulation models Testing of warm-up methods Testing of replications methods Testing of run-length methods Development, testing and revision of candidate methods Develop and test automated procedure (including prototype software) Dissemination Total * Largely complete at the time of writing the proposal Timescale (months) 3 2 6 2 6 6 5 6 36 The testing, development and revision stages will be performed iteratively. Responsibilities for the work: Project Manager (Dr Mark Elder): management of the project; provision of software (SIMUL8), training and support; provision of real simulation models; access to SIMUL8 users. Principal Investigator (Professor Stewart Robinson): management of the academic content of the project and the day-to-day work of the Research Assistant. Co-Investigator (Professor Ruth Davies): academic support to the project and the Research Assistant. Research Assistant: performing the project work. The Project Manager and Research Assistant will meet on a regular basis (once a month) to discuss progress on the project and requirements for continued work. A full team meeting will take place four times a year. Relevance to Beneficiaries SIMUL8 Corporation: Generation of ideas to improve the experimental process for SIMUL8 users and consultants Some ideas will be incorporated into the SIMUL8 software as they are generated Potential development of commercial software to support the experimental process Other beneficiaries: Simulation users: who need better support for experimentation Simulation academics: who will benefit from the results of the testing of the warm-up, replication and runlength methods, and from the development and revision of the methods Simulation software vendors: who will obtain ideas on how to implement experimental support within their software Business: who will benefit from the improved use of simulation models in their decision-making, realising more fully the benefits of simulation Dissemination and Exploitation Conferences and Journals: National conferences e.g. Operational Research Society Simulation Workshop International conferences e.g. EURO, Winter Simulation Conference Journals e.g. Journal of the Operational Research Society, European Journal of Operational Research, ACM Transactions on Modeling and Computer Simulation Web Site: reporting on progress of the research, the automated procedure and conclusions of the work. Software routines developed as part of this work will be made available via this web site. SIMUL8 Newsletter: an e-news letter that is circulated to all SIMUL8 users Software Development: should the project be successful it is envisaged that an analyser will be developed by SIMUL8 Corporation for their software. This work is not part of the project proposal. References Alexopoulos, C. and Seila, A.F. (1998). Output Data Analysis. Handbook of Simulation, (Banks, J., ed.). Wiley, New York, pp. 225-272. Cash, C.R., Nelson, B.L., Dippold, D.G., Long, J.M. and Pollard, W.P. (1992). Evaluation of Tests for InitialCondition Bias. Proceedings of the 1992 Winter Simulation Conference (J.J. Swain, J.J., Goldsman, D., Crain, R.C. and Wilson, J.R., eds.). IEEE, Piscataway, NJ, pp. 577-585. Chick, S., Sanchez, P.J., Ferrin, D. and Morrice, D.J. (2003). Proceedings of the 2003 Winter Simulation Conference. IEEE, Picataway, NJ. Fiddy, E., Bright, J.G. and Elder, M.D. (1982). Problem Solving by Pictures. Proceedings of the Institute of Mechanical Engineers, pp. 125-138. Goldsman, D., Schruben, L.W. and Swain, J.J. (1994). Tests for Transient Means in Simulated Time Series. Naval Research Logistics, 41, pp. 171-187. Hlupic, V. (1999). Discrete-Event Simulation Software: What the Users Want. Simulation, 73 (6), pp. 362-370. Hollocks, B.W. (2001). Discrete-Event Simulation: An Inquiry into User Practice. Simulation Practice and Theory, 8, pp. 451-471. Law, A.M. and Kelton, W.D. (2000). Simulation Modeling and Analysis, 3rd ed. McGraw-Hill, New York. Law, A.M. and McComas, M.G. (2002). Simulation-Based Optimization. Proceedings of the 2002 Winter Simulation Conference (Yücesan, E., Chen, C-H., Snowden, S.L. and Charnes, J.M., eds.). IEEE, Piscataway, NJ, pp. 41-44. Pidd, M. (2004). Computer Simulation in Management Science, 5 th ed. Wiley, Chichester, UK. Robinson, S. (1995). An Heuristic Technique for Selecting the Run-Length of Non-Terminating Steady-State Simulations. Simulation 65 (3), pp. 170-179. Robinson, S. (2002). A Statistical Process Control Approach for Estimating the Warm-up Period. Proceeding of the 2002 Winter Simulation Conference (Yücesan, E., Chen, C-H., Snowden, S.L. and Charnes, J.M., eds.). IEEE, Piscataway, NJ, pp. 439-446. Robinson, S. (2004). Simulation: The Practice of Model Development and Use. Wiley, Chichester, UK. Schriber, T. (1974). Simulation Using GPSS. Wiley, New York. Tocher, K.D. (1963). The Art of Simulation. The English Universities Press, London. Welch, P. (1983). The Statistical Analysis of Simulation Results. The Computer Performance Modeling Handbook (Lavenberg, S., ed.). Academic Press, New York, pp. 268-328.