Final PDF to printer CHAPTER 18 Simulation CHAPTER CONTENTS 18.1 What Is Simulation? 18.2 Monte Carlo Simulation 18.3 Random Number Generation 18.4 Excel Add-Ins 18.5 Dynamic Simulation CHAPTER LEARNING OBJECTIVES When you finish this chapter you should be able to LO 18-1 List characteristics of situations where simulation is appropriate. LO 18-2 Distinguish between stochastic and deterministic variables. LO 18-3 Explain how Monte Carlo simulation is used and why it is called static. LO 18-4 Explain how to generate random data by using a discrete or continuous CDF. LO 18-5 Use Excel to generate random data for several common distributions. LO 18-6 Describe functions and features of commercial modeling tools for Excel. LO 18-7 Explain the main reasons for using dynamic simulation and queuing models. 18-1 doa57594_web_ch18_001-020_onlinecontent.indd 18-1 10/09/17 03:49 PM Final PDF to printer ©Greg Pease/Getty Images 18.1 WHAT IS SIMULATION? A simulation is a computer model that attempts to imitate the behavior of a real system or activity. Models are simplifications that try to include the essentials while omitting unimportant details. We use simulation to help quantify relationships among variables that are too complex to analyze mathematically. We can test our understanding of the world by seeing whether our model leads to realistic predictions. If the simulation’s predictions differ from what really happens, we can refine the model in a systematic way until its predictions are in close enough agreement with reality. LO 18-1 List characteristics of situations where simulation is appropriate. Simulation Defined Simulation, from the Latin simulare, means to ‘‘fake’’ or to ‘‘replicate.’’ Today, computer simulation is used as a powerful tool to assess the impact of policies, to evaluate performance, to train professionals and more . . . without actually having to experiment with or perturb the real system. Source: From http://www.iro.umontreal.ca/~vazquez/SimSpiders/GenerRV/index.html System simulation is the mimicking of the operation of a real system in a computer, such as the day-to-day operation of a bank, or the value of a stock portfolio over a time period, or the running of an assembly line in a factory, or the staff assignment of a hospital or a security company. Instead of building extensive mathematical models by experts, simulation software has made it possible to model and analyze the operation of a real system by non-experts, who are managers but not programmers. Source: From http://home.ubalt.edu/ntsbarsh/simulation/sim.htm#rintroduction A Versatile Tool Simulation is a rehearsal. We rehearse a play or a speech. We take practice SAT exams. We go to football practice. We do so because we want to make our mistakes before the “real thing,” when the consequences of a major flub are consequential. Business and not-for-profit enterprises know that a walk-through is essential before a change is implemented. Simulation is planning. Super Bowl planning begins years in advance, picking the site, analyzing hotel capacity, envisioning transportation facilities and entertainment, and so on. Super Bowl site evaluation involves what-if analysis. Where will traffic bottlenecks develop? How long, on average, will it take for people to get from the hotels to the stadium? How long will 18-2 doa57594_web_ch18_001-020_onlinecontent.indd 18-2 10/09/17 03:49 PM Final PDF to printer 18-3 Applied Statistics in Business and Economics people have to wait for restaurant seating at peak times? When does planning become simulation, and vice versa? The boundary is not always clear. Simulation is a behavioral tool that helps decision makers focus on important aspects of a problem, instead of bickering about details, preferences, or personalities. In creating a simulation model, people are obliged to state their assumptions, name the variables that are important, and suggest hypothesized relationships among the variables. Simulation is not just a quantitative tool for operations research specialists, but rather a general device to help people think clearly. Applications Simulation models can be quite simple or very complex, depending on the purpose. A queuing model of customers at a single bank ATM requires only a simple Poisson model of arrivals and empirical estimates of the mean arrival rate by time of day. A queuing model of a grocery store with multiple checkout lanes is more complex. A model of Disney World queues during the busy season is very complex. We sometimes simulate events by using people, as in disaster simulations to test emergency personnel preparedness for terror attacks or disease outbreaks in major cities. Simulation studies have improved • Passenger flows at Vancouver International Airport. • Hospital surgery scheduling at Henry Ford Hospital. • Traffic flows in metropolitan Oakland County. • Waiting lines at Disney World. • Just-in-time scheduling in Toyota auto assembly plants. Besides these real activities, you are probably familiar with computer games that simulate car chases, Kung-Fu, and WWI aerial dogfights. Flight simulators can be as close to real flying as the budget will allow, ranging from a PC Cessna 172 up to the Boeing 787 used by the airlines to certify (yes, actually certify) their pilots. When Do We Simulate? There are many reasons, but simulation is especially attractive when real experiments are dangerous, costly, or impossible. Training a novice pilot in a flight simulator is safer and cheaper than using a real airplane. Of course, the simulation must adequately describe reality, or the simulation is worse than useless. In general, we might consider simulation when • • • • • The system is complex. Uncertainty exists in the variables. Real experiments are impossible or costly. The processes are repetitive. Stakeholders can’t agree on policy. Conversely, we are less inclined to simulate when the system is simple, variables are stable or nonstochastic, real experiments are cheap and nondisruptive, the event will only happen once, or stakeholders agree on policy. Sometimes simulation is followed up by “dry runs” with a real system. For example, the Denver International Airport was designed from the ground up, so nobody knew how its automated baggage handling system would perform. Engineering design showed that it would be successful. But during a rehearsal with actual bags prior to opening the new airport, bags were crushed and were routed incorrectly. The airport opening was delayed until the problems could be resolved. There are limits to any simulation’s ability to mimic the “real thing.” But a priori analysis through simulation modeling can reveal potential problem areas, sometimes without actual physical testing of the system. doa57594_web_ch18_001-020_onlinecontent.indd 18-3 10/09/17 03:49 PM Final PDF to printer Chapter 18 Simulation 18-4 Advantages of Simulation Unlike a deterministic model (in which the variables can’t vary), simulation lets key variables change in random but specified ways so that we can see what happens to the bottom-line decision variable(s) of interest. It helps us understand the range of possible outcomes and their probabilities and allows a sensitivity analysis showing which factors have the most influence on the outcome. Simulation is useful in business, government, and health care because it • • • • • • • Is less disruptive than real experiments. Forces us to state our assumptions clearly. Helps us visualize the implications of our assumptions. Reveals system interdependencies. Quantifies risk by showing probabilities of events. Helps us see a range of possible outcomes. Promotes constructive dialogue among stakeholders. A simulation project has several phases. In Phase I (design) we identify the problem, set objectives, design the model, and collect data. In Phase II (execution) we do empirical modeling, specify the variables, validate the model, execute the simulation, and prepare reports. In Phase III (communication) we explain the findings to decision makers. Thinking of it this way, you can see that simulation can help bring people together in a common way of thinking. Risk Assessment Risk assessment means thinking about a range of outcomes and their probabilities. People don’t always think this way. Automobile executives often want their marketing staff to provide a single, most likely sales volume forecast for a new vehicle. Accountants for an electric utility are asked to provide a cash flow forecast with a single, most likely prediction. You want to know what grade you will get in your statistics class. But variation is inevitable. A point estimate for a random variable is almost certain to be wrong. If you are a “B” student, you can’t be sure of a “B” on every exam. The “B” is only the average. In general, if X is normally distributed and you predict that the next item sampled will be equal to the mean, you are ignoring the distribution around the mean. Remember the Empirical Rule (68 percent within μ ± 1σ, 95 percent within μ ± 2σ, etc.)? It would assist a decision maker to know the 95 percent range of possible values for the decision variable, as well as the “most likely” value μ. That is the point of risk assessment. It is especially useful when the model is so complex that it is difficult to study mathematically. Components of a Simulation Model Simulation models can be classified in various ways, but they have some things in common. Table 18.1 summarizes the components of a simulation model in general terms. Simulation variables can be either deterministic (nonrandom or fixed) or stochastic (random) variables. If a variable is stochastic, we must hypothesize its distribution (normal, exponential, etc.). By allowing the stochastic variables to vary, we can study the behavior of the output variables that interest us to establish their ranges and likelihood of occurrence. We are also interested in the sensitivity of our output variables to variation in the stochastic input variables. There are two broad types of simulation models: static simulation (time isn’t explicit) and dynamic simulation (events occurring sequentially over time). Dynamic simulation requires specialized software, while simple static simulation can be done in Excel spreadsheets. Therefore, we will begin by discussing static simulation, using Excel functions. Then we’ll discuss commercial software that can facilitate static simulation, and finally take a brief look at dynamic simulation. doa57594_web_ch18_001-020_onlinecontent.indd 18-4 LO 18-2 Distinguish between stochastic and deterministic variables. 10/09/17 03:49 PM Final PDF to printer 18-5 Applied Statistics in Business and Economics TABLE 18.1 Component Components of a Simulation Model Explanation List of deterministic factors F1, F2, . . . , Fm These are quantities that are known or fixed, or whose behavior we choose not to model (i.e., exogenous). List of stochastic input variables V1, V2, . . . , Vk These are quantities whose value cannot be known with certainty and are assumed to vary randomly. List of output variables O1, O2, . . . , Op These are stochastic quantities that are important to a decision problem, but whose value depends on things in the model and whose distribution is not easily found. Assumed distribution for each stochastic input variable These are chosen from known statistical distributions, such as normal, Poisson, triangular, and so on. A model that specifies the rules or formulas that define the relationships among Fs and Vs Formulas may be accounting identities such as Profit = Revenue − Cost or behavioral hypotheses such as Car Sales = b0 + b1 (Income after Taxes) + b2 (Net Worth) + b3 (3-month T-Bill Rate). A simulation method that produces random data from the specified distributions and captures the results This is a programming language (such as VBA) although it may be embedded invisibly in a spreadsheet with built-in functions like Excel’s =RAND() or other add-ins. An interface that summarizes the model’s inputs, outputs, and simulation results LO 18-3 Explain how Monte Carlo simulation is used and why it is called static. Typically, spreadsheet tables or graphs to summarize the outcomes of the simulation. 18.2 MONTE CARLO SIMULATION Static simulation, in which time is not considered, uses the Monte Carlo method. The computer creates the values of the stochastic random variables. However, “random” does not mean “chaotic” because we specify the distribution (e.g., normal) and its parameters (e.g., μ and σ). Then we draw repeated samples from each distribution—often hundreds or thousands of iterations. Each sample yields one possible outcome for each stochastic variable. By studying the results, we can see the range of possibilities and how frequently each outcome occurs. For each output variable of interest, we usually look at percentiles (e.g., quartiles) as well as the mean, based on many samples. We usually make a histogram or a similar visual display of the results. We also do this for each stochastic input variable, to verify that the sampling is being done correctly (i.e., to make sure the desired distribution is being sampled). Which Distribution? You can use any distribution for a stochastic input variable. But in a static simulation (e.g., for financial modeling), some are used more than others. Table 18.2 shows four probability distributions that are of interest because they correspond to the way managers often think and can easily be simulated in Excel. Suppose that the price of aluminum is a stochastic input in your monthly cash flow forecasts for the next 12 months. You want to choose a model to represent the price of aluminum. The uniform model lets the price vary anywhere within the range a to b, with no central tendency. The normal model allows symmetric variation about a historical mean, if you know the historical standard deviation or use some form of the Empirical Rule such as σ = range/6. The triangular model allows you to state the range a to b but also allows a best guess c without forcing you to assume symmetry around the mean. The exponential model describes a variable that usually is very near zero but could have very high values. doa57594_web_ch18_001-020_onlinecontent.indd 18-5 10/09/17 03:49 PM Final PDF to printer Chapter 18 Simulation 18-6 TABLE 18.2 Some Useful Distributions Distribution Illustration Characteristics The familiar bell-shaped curve. Symmetric, with a peak in the middle and gradually tapering tails. Normal N( μ, σ) Pro: Familiar, well-known. Con: Extreme outcomes possible. µ - 3σ µ - 2σ µ - 1σ µ µ + 1σ µ + 2σ µ + 3σ Has a central peak (mode) and clear end points (minimum, maximum). Triangular T(a, b, c) Pro: Easy to understand. Con: Harder to simulate. a b c Uniform U(a, b) Specify upper and lower limits. Every value is equally likely. 1 b-a Pro: Easy to understand. Con: Range may be too broad. a Exponential Expon(λ) b λ Ideal for waiting time. Mode is zero. Large values are rare. Pro: For highly skewed data. Con: Extreme values possible. The Axolotl Corporation sells three products (A, B, and C). Prices are set competitively and are assumed constant. The quantity demanded, however, varies widely from month to month. To prepare a revenue forecast, the firm sets up a simple simulation model of its input variables, as shown in Table 18.3. The output variable of interest is total revenue PA QA + PB QB + PC QC. EXAMPLE 18.1 Three-Product Revenue Forecast TABLE 18.3 Simulation Setup for Revenue Calculation Variable Type Price Quantity Deterministic Stochastic input Revenue Stochastic output doa57594_web_ch18_001-020_onlinecontent.indd 18-6 Product A Product B Product C PA = 80 Normal QA ~ N(50, 10) μ = 50, σ = 10 PB = 150 Triangular QB ~ T(0, 5, 40) Min = 0, Max = 40, Mode = 5 PBQB PC = 400 Exponential QC ~ Expon(λ) λ = 2.5 PAQA PCQC 10/09/17 03:49 PM Final PDF to printer 18-7 Applied Statistics in Business and Economics You can see that variation in the quantity demanded would make it difficult to predict total revenue. You could predict its mean, based on the mean of each distribution, but what about its range? Simulation reveals things that are not obvious. The results of a static simulation using 100 Monte Carlo iterations are shown in Table 18.4 and summarized in Figure 18.1. TABLE 18.4 Results of 100 Iterations of Revenue Simulation Percentile Min 5% 25% 50% 75% 95% Max Sample Mean Expected Mean Product A Product B 26 34 44 50 56 64 69 49.85 50 1 4 8 14 20 32 35 15.11 15 Product C Total Revenue 0 0 1 2 3 6 11 2.30 2.5 4,180 4,745 5,943 7,000 8,340 10,022 10,780 7,335 7,250 Note: For product A (normal), the mean demand is μA = 50. For product C (exponential), the mean demand is μC = 2.5. For product B (triangular), the mean demand is μB = (a + b + c)/3 = (0 + 40 + 5)/3 = 15. Assuming independence, the mean total revenue is PA μA + PB μB + PC μC = (80)(50) + (150)(15) + (400)(2.5) = 4,000 + 2,250 + 1,000 = 7,250, which compares well with the simulation mean. Although product A contributes the most to total revenue at the mean, this may not be the case in a particular simulation because demand can fluctuate. FIGURE 18.1 Histograms for 100 Iterations of Revenue Simulation Demand—Product A Demand—Product B 35 50 30 40 25 30 20 20 15 10 10 5 0 0 10 20 30 40 50 60 70 80 0 5 10 Demand—Product C 15 20 25 30 35 40 Total Revenue 80 50 40 60 30 40 20 20 10 0 0 0 3 6 9 doa57594_web_ch18_001-020_onlinecontent.indd 12 18-7 15 18 21 2,000 4,000 6,000 8,000 10,000 12,000 14,000 10/09/17 03:49 PM Final PDF to printer Chapter 18 Simulation 18-8 These results suggest that Axolotl’s revenue could be as low as $4,180 or as high as $10,780. There is a 50 percent chance (between the 25th and 75th percentiles) of seeing revenue between $5,943 and $8,340. The median revenue seems to be below the mean, suggesting that total revenue is right-skewed. That is to be expected because both the triangular (product B) and exponential (product C) are right-skewed distributions (you can also see the skewed distribution of demand for products B and C by looking at their simulation results). The normal distribution (product A) is reflected in the simulation results, which are symmetric and lie well within the μ ± 3σ limits. This simulation could be repeated another 100 times by clicking a button. Simulation results will vary, but as long as the number of iterations is reasonably large, you will see considerable stability. The results shown here are typical of this static model. If you wish to play with this model further, it is available on the LearningStats downloads (see endof-chapter McGraw-Hill Connect® resources). Color-coding is used in the spreadsheet and graphs to distinguish inputs from outputs. 18.3 RANDOM NUMBER GENERATION LO 18-4 Basic Concept: Inverse CDF Random numbers are at the heart of any simulation. So how do we generate random data? In general, if you know F(x), the cumulative distribution function (CDF) of your distribution, you generate a uniform U(0, 1) random number R and then find F −1(R), where F −1 is the inverse CDF. Essentially, what you have to do is set F(x) = R and then solve for x, as illustrated in Figures 18.2 and 18.3. However, this is sometimes easier said than done because finding F −1 may be tricky, especially for discrete distributions as shown in Figure 18.3. If you use a programming language, it is not difficult, and there are plenty of commercial packages that do it. But it is harder if you are a do-it-yourself person who wants to use only the functions available within Excel. Explain how to generate random data by using a discrete or continuous CDF. FIGURE 18.2 1.00 Random x from Continuous CDF 0.80 R 0.60 0.40 0.20 0.00 x FIGURE 18.3 1.00 Random x from Discrete CDF 0.80 R 0.60 0.40 0.20 0.00 doa57594_web_ch18_001-020_onlinecontent.indd x 18-8 10/09/17 03:49 PM Final PDF to printer 18-9 Applied Statistics in Business and Economics Random Data in Excel Table 18.5 shows some Excel functions to create random data from a few of the more common distributions. After you use them in your own spreadsheet, you will begin to see how they work. Some of the end-of-chapter exercises ask you to use these functions, so look them over carefully. TABLE 18.5 Creating Random Data in Excel Distribution What to Put in Excel Cell Explanation Uniform U(0, 1) =RAND() Built-in Excel function Uniform U(a, b) =RANDBETWEEN($A$1, $A$2) $A$1 is the minimum and $A$2 is the maximum (or use cell names like Xmin and Xmax). Normal N(0, 1) =NORM.S.INV(RAND()) Excel’s inverse normal function. Normal N( μ, σ) =NORM.INV(RAND(), $A$1, $A$2) $A$1 is the mean and $A$2 is the standard deviation (or use cell names like Mu and Sigma). Exponential Expon(λ) =-LN(RAND())*$A$1 $A$1 is the mean Poisson arrival rate (or use cell name like Lambda) Triangular T(a, b, c) No single cell formula but can be done in Excel with two cells (see LearningStats). Better to use @Risk, XLSim, or LearningStats. Binomial B(n, π) =BINOM.INV($A$1, RAND(), $A$2) $A$1 is the number of trials and $A$2 is the probability of success. Other Ways to Get Random Data LO 18-5 Use Excel to generate random data for several common distributions. Excel’s Data Analysis > Random Number Generation will create random data for uniform, normal, Bernoulli, binomial, and Poisson distributions (see Figure 18.4). MegaStat (MegaStat > Random Numbers) makes uniform, normal, and exponential random data (see Figure 18.5). Minitab (Calc > Random Data) offers a very broad menu of distributions (see Figure 18.6). There are even websites that will give you guaranteed random numbers! For spreadsheet Monte Carlo simulation, it is best to use a specialized package such as @Risk, XLSim, or YASAI that offers many built-in functions to create random data and keep track of your simulation results (see Useful Websites and Related Reading at the end of this chapter). FIGURE 18.4 Generating Random Data in Excel Bootstrap Method In recent years, much attention has been paid to resampling to estimate unknown parameters, most notably the bootstrap method. It can be applied to just about any parameter. doa57594_web_ch18_001-020_onlinecontent.indd 18-9 10/09/17 03:49 PM Final PDF to printer Chapter 18 Simulation 18-10 FIGURE 18.5 Generating Random Data in MegaStat FIGURE 18.6 Generating Random Data in Minitab Although it requires specialized software, the bootstrap method is easy to explain. It rests on the principle that the sample reflects everything we know about the population. From a sample of n observations, we use Monte Carlo random integers to take repeated samples of n items with replacement from the sample and to calculate the statistic of interest for each sample. The average of these statistics is the bootstrap estimator. The standard deviation from these repeated estimates is the bootstrap standard error. The distribution of these repeated estimates is the bootstrap distribution (which generally is not normal). The bootstrap method of estimation (see LearningStats downloads for Unit 08 for more details and a spreadsheet simulation) avoids having to assume normality when constructing a confidence interval or finding percentiles. The accuracy of the bootstrap estimator increases with the number of resamples (e.g., we might resample the sample 10,000 times to get many possible variations on the sample information). The percentiles of the resulting distribution of sample estimator provide the bootstrap confidence interval. For example, a 90 percent confidence interval would be formed by the 5th and 95th percentiles. No assumption of normality is required. Before the advent of powerful computers, such an approach was unthinkable. When data are badly skewed, the bootstrap is an excellent choice. Resampling is not just for means. There are bootstrap estimators for most common statistics, as well as for those that are hard to study mathematically. Some statistical packages now offer bootstrap estimators. Resampling is in the mainstream of statistics, even though it is less familiar to most people in business (check the web for further information). doa57594_web_ch18_001-020_onlinecontent.indd 18-10 10/09/17 03:49 PM Final PDF to printer 18-11 Applied Statistics in Business and Economics 18.4 EXCEL ADD-INS LO 18-6 Describe functions and features of commercial modeling tools for Excel. We can generate our own random data within Excel. However, Excel isn’t optimized for statistics and doesn’t keep track of your results. Other vendors (e.g., @Risk) have created Excel AddIns offering more features. They not only calculate probabilities but also permit Monte Carlo simulation to draw repeated samples from a distribution. Illustration: Using @Risk BobsNetWorth Table 18.6 shows some examples of @Risk input functions that can be pasted directly into cells in an Excel spreadsheet. These functions are intuitive and easy to use. The input cell becomes active and will change each time you update the spreadsheet by pressing F9. We illustrate @ Risk simulation with Bob’s net worth. The spreadsheet is shown in Figure 18.7 (it is also in LearningStats downloads for Chapter 18). For those who do not have access to @Risk software (probably a majority), the downloads also contain “pure Excel” versions with reduced TABLE 18.6 Examples of @Risk Distributions @Risk Function Example Interpretation Normal =RiskNormal(47,2) Truncated normal =RiskTnormal(47,2,43,51) Triangular =RiskTriang(3,8,14) Normal with mean μ = 47 and standard deviation σ = 2. Normal with mean μ = 47 and standard deviation σ = 2. Lowest allowable value is 43 and highest allowable value is 51 (set at μ ± 2σ). Lowest value is 3, most likely value is 8, highest value is 14. FIGURE 18.7 Bob’s Stochastic Balance Sheet doa57594_web_ch18_001-020_onlinecontent.indd 18-11 10/09/17 03:49 PM Final PDF to printer Chapter 18 Simulation 18-12 capabilities. Where relevant, cell range names are used (e.g., Net Worth) instead of cell references (e.g., H9). Comments have been added to cells that specify stochastic inputs (purple highlight) or stochastic outputs (orange highlight). Output cells are bottom-line variables of interest while input cells are the drivers of the output(s). On a given day, Bob views his actual net worth as dependent on the market value of his assets. Some of his asset and liability values are deterministic (e.g., checking account, savings account, student loans) while the values of his car, beer can collection, and stocks (and hence his net worth) are stochastic. Table 18.7 shows the @Risk functions that describe each stochastic input (distribution, skewness, coefficient of variation). Asset @Risk Function Comments Mustang Beer can collection Garland stock Oxnard stock ScamCo stock =RiskTriang(8000,10000,15000) =RiskTriang(0,50,1000) =RiskNormalN(15.38,3.15) =RiskNormal(26.87,2.02) =RiskNormal(3.56,0.26) Triangular, right-skewed Triangular, very right-skewed Normal, symmetric, large CV Normal, symmetric, small CV Normal, symmetric, small CV TABLE 18.7 Distributions Used in Bob’s Stochastic Balance Sheet These functions tell us about Bob’s reasoning. For example, Bob thinks that if he finds the right buyer, his Mustang could be worth up to $15,000. He is quite sure he won’t get less than $8,000, and he figures that $10,000 is the most likely value. When you paste an @Risk input function for the desired statistical distribution in a cell, its contents become stochastic, so that every time the spreadsheet is updated a new value will appear. For example, the input cell for Oxnard containing the @Risk function =RiskNormal(26.87,2.02) is a random variable with μ = 26.87 and σ = 2.02. All @Risk distributions are available from Excel’s Insert > Function menu. An output cell is calculated as usual except that =RiskOutput()+ is added in front of the cell’s contents; for example, =RiskOutput + TotalAssets – TotalDebt, where TotalAssets and TotalDebt are defined elsewhere in the spreadsheet (of course, you can also use cell references like C12 and H7 instead of cell names). The @Risk toolbar appears on the regular Excel toolbar, as illustrated in Figure 18.8. FIGURE 18.8 Simulation Ribbon in @Risk The @Risk setup screens and typical settings are shown in Figure 18.9. You can get up to 10,000 Monte Carlo replications. @Risk keeps track of all simulated values of the input and output cells, and will let you see various displays of the simulation results. For each stochastic input cell, choose a distribution (see Figure 18.10). Then click the Start Simulation icon on the top menu bar. Various reports can be generated and placed either in a new workbook or in the active workbook. You will get a menu of graphs that are available. Distributions of simulated input and output variables can be displayed in tables (statistics, percentiles) or charts (histograms, cumulative distributions). You can reveal any desired percentile on the cumulative distribution, or use a tornado chart to reveal sensitivities of output variables to all the input variables. By default, the middle 90 percent of the outcomes are shown. In Figure 18.11, we see the distribution of net worth. The shape is symmetric but platykurtic, with mean $9,498. In the simulation, Bob’s net worth exceeded $10,000 about 40 percent of the time. You can drag the doa57594_web_ch18_001-020_onlinecontent.indd 18-12 10/09/17 03:49 PM Final PDF to printer 18-13 Applied Statistics in Business and Economics FIGURE 18.9 Typical Simulation Settings FIGURE 18.10 Distributions to Choose vertical sliders to show different percentiles. This is easy, but inexact. To select integer percentiles, use the arrows at the bottom of the histogram. The ascending cumulative distribution (Figure 18.12) reveals additional detail. For sensitivity analysis, click on the Tornado graph icon to see a list of factors that explain variation in net worth, listed in order of importance, as illustrated in Figure 18.13. Bars that face right are positively affecting the output variable, while bars that face left (if any) are affecting the output variable negatively. Sensitivities can range from −1.0 to +1.0, with values near zero indicating lack of importance. Here, the Mustang value and Oxnard and Garland stock prices are the most important input variables, while beer cans are less important and the ScamCo stock contributes little variation to net worth. Pros and Cons While these Excel Add-In packages are powerful, their cost may strain academic budgets. Fortunately, some textbooks offer a student version at modest extra cost. Nonetheless, student lab setup is likely to require a skilled site administrator, and some training is required to use these packages effectively. LearningStats downloads (see end-of-chapter McGraw-Hill Connect® resources) include exercises using @Risk, along with instructions, but there also are Excel-only versions for those who don’t have access to @Risk. doa57594_web_ch18_001-020_onlinecontent.indd 18-13 10/09/17 03:49 PM Final PDF to printer Chapter 18 Simulation 18-14 FIGURE 18.11 Histogram of 500 Iterations of Net Worth FIGURE 18.12 Cumulative Distribution of 500 Iterations of Net Worth FIGURE 18.13 Tornado Graph for Sensitivity Analysis of Five Inputs doa57594_web_ch18_001-020_onlinecontent.indd 18-14 10/09/17 03:49 PM Final PDF to printer 18-15 Applied Statistics in Business and Economics 18.5 DYNAMIC SIMULATION LO 18-7 Explain the main reasons for using dynamic simulation and queuing models. Discrete Event Simulation In a dynamic simulation, input variables are defined at discrete points in time (such as every minute) or continuously (changing smoothly over time). The most common form of dynamic simulation is discrete event simulation in which the system state is assessed by a clock at distinct points in time. If you already knew something about simulation before reading this chapter, you were probably thinking of dynamic simulation, which involves computer modeling of flows (e.g., airport passenger arrivals, automobile assembly line flow, hospital surgical suite scheduling). In discrete event simulation, we observe a “snapshot” of the system state at any given moment. The system activity may be represented visually, even using animation to help us visualize flows, queues, and bottlenecks. The emphasis in discrete event simulation is on measurements such as • • • • • • Arrival rates Service rates Length of queues Waiting time Capacity utilization System throughput Although it is fairly easy to understand and extremely powerful, this kind of simulation requires specialized software, and therefore will not be discussed in detail here. But we can make a few general comments. Most universities offer courses on simulation, if you want to know more. Queuing Theory If customer arrivals per unit of time follow a Poisson distribution and service times follow an exponential distribution, some rather interesting theorems have been proven regarding the length of customer queues, mean waiting times, facility utilization, and so on. This is known as queuing theory. Queuing theory is a topic covered in courses such as operations management, simulation, or decision modeling. It flows from what you have learned already in statistics. The simplest situation is a single-server facility (such as a single ticket window selling tickets) whose customers form a single, well-disciplined queue (first-come, first-served) whose arrivals from an infinite source are Poisson distributed with mean λ (customer arrivals per unit of time) and service times are exponentially distributed with mean 1/μ (units of time per customer) or its reciprocal μ (customers served per unit of time). If we assume that λ < μ to prevent the buildup of an infinite queue, then the following may be demonstrated: λ units of time E xpected wait time: ______ μ(μ − λ ) (18.1) λ2 E xpected length of waiting line: ______ customers μ(μ − λ ) (18.2) λ × 100% Expected facility utilization: __ μ (18.3) Simulating Queuing Models Theorems such as these are quite useful in facility planning. Unfortunately, the situation can quickly become more complex, as shown in Figure 18.14. We could have multiple servers with a single, well-disciplined queue (as in most banks and post offices) or multiple servers with doa57594_web_ch18_001-020_onlinecontent.indd 18-15 10/09/17 03:49 PM Final PDF to printer Chapter 18 Simulation 18-16 FIGURE 18.14 Various Queuing Situations Type of Queue Customer Queue Pattern Service Facilities Single queue, single server Server 1: Single queue, multiple servers Server 2: Server 3: Server 1: Multiple queues, multiple servers Server 2: Server 3: Serial server, single queue Server 1: Server 2: Server 3: multiple queues (as in a grocery store checkout) or one queue with multiple serial servers (each with its own service rate) where you must complete one step in the process before going to the next (as in some hospital admissions or manufacturing processes). Further, many applications of queuing models do not meet the assumption of Poisson arrivals or exponential service times. In that case, the results shown in formulas 18.1–18.3 are no longer valid. This is where simulation modeling is most valuable because we can evaluate queuing systems with many different structures in a computer model that allows for the complexity needed. One popular simulation package is called Arena. With Arena, we can model both small and large business processes that contain interrelated steps and process feedback loops. Arena allows businesses involved in process improvement projects to test process changes in a realistic simulation before making costly, and perhaps permanent, changes to a real process. Arena can be integrated with Microsoft Visio, a process diagramming software tool, and is therefore frequently used in business courses on process design and improvement techniques. Simulation is used to study processes that involve stochastic (random) variables as well as deterministic (nonrandom) variables. Simulation is useful when real experiments are impossible or costly. Simulation is useful for planning, risk assessment, and what-if analysis. Simulation helps decision makers assess the likelihood of various possible outcomes and the effects of their decisions. Monte Carlo models are static because time is not explicit. They use computer-created values of the stochastic input variable(s) from specified distributions (e.g., normal) and their parameters (e.g., μ and σ). From many such samples, an empirical distribution is created for each output variable of interest. To generate a random data value for a given input variable, we generate a uniform random deviate U(0, 1) and then find the inverse CDF for the assumed distribution of the input variable. In a dynamic simulation, time is explicit. Their applications include models of arrivals, service times, and queues (e.g., in a grocery checkout lane). For simple queuing models, there are formulas for mean waiting times, queue length, and facility utilization. However, specialized (and costly) software is needed for detailed simulation of flows over time and the resulting empirical distributions, so dynamic simulation is not ordinarily studied in introductory statistics. doa57594_web_ch18_001-020_onlinecontent.indd 18-16 CHAPTER SUMMARY 10/09/17 03:49 PM Final PDF to printer 18-17 Applied Statistics in Business and Economics KEY TERMS CHAPTER REVIEW Arena @Risk bootstrap method deterministic variables dynamic simulation input variable inverse CDF models Monte Carlo method output variable queuing theory risk assessment simulation static simulation stochastic variables Visio what-if analysis XLSim YASAI 1. Define (a) simulation, (b) deterministic variable, (c) stochastic variable, and (d) risk assessment. 2. Explain how simulation is (a) a planning tool and (b) a behavioral tool. 3. Name three applications of simulation. 4. When is simulation appropriate? When is it not appropriate? 5. (a) List five advantages of simulation. (b) What are the three stages of simulation modeling? 6. What are the two types of simulation? How are they different? 7. Explain the meaning of these components of a simulation model: (a) deterministic factors, (b) stochastic input variable, (c) output variable, (d) model, and (e) interface. 8. (a) Why does this chapter focus mainly on static simulation? (b) What is Monte Carlo simulation? Explain how it works. 9. Name three distributions that are useful in simulation and give their main characteristics. 10. To generate random numbers, list some distributions covered in this textbook that can be created in (a) Excel, (b) MegaStat, and (c) Minitab. 11. Explain the meaning of (a) dynamic simulation, (b) well-disciplined queue, and (c) infinite queue. 12. (a) List five variables that can be studied using a queuing model. (b) List two kinds of queuing models that are more complex than a single-server queue. 13. (a) Why do we need packages like @Risk? (b) What factors limit their use? CHAPTER EXERCISES 18.1 (a) Use Excel’s function =NORM.INV(RAND(),50,8) to create 100 random numbers from the normal distribution N(50, 8). Hint: Refer to Table 18.5. (b) Calculate the sample mean and standard deviation and then compare them with their theoretical values. (c) Is the range what you would expect from this normal distribution? Explain. (d) Make a histogram or similar display. Does the shape appear normal? 18.2 Create 100 random numbers with mean 0 and standard deviation 1 from the standard normal distribution N(0, 1). Use the built-in random number generators (not your own functions) from as many of these as you can: (a) Excel, (b) MegaStat, (c) Minitab. List pros and cons of each package’s capabilities and ease of use. 18.3 (a) Use Excel’s function =RAND() to create 100 uniform U(0, 1) random numbers. (b) Calculate the sample mean and standard deviation and compare them with their theoretical values (see Chapter 7). 18.4 (a) Use Excel’s Data Analysis > Random Number Generation to create 100 Poisson random numbers with mean λ = 2.5. (b) Calculate the sample mean and standard deviation and compare them with their theoretical values (see Chapter 6). 18.5 (a) Use Excel’s Data Analysis > Random Number Generation to create 100 binomial random numbers with n = 30 and π = .25. (b) Calculate the sample mean and standard deviation and compare them with their theoretical values (see Chapter 6). 18.6 (a) Use Excel’s Data Analysis > Random Number Generation to create 100 discrete random numbers with values x = 0, 50, 100, 200, 500, 1,000 whose respective probabilities are P(x) = .40, .25, .15, .10, .05, .05. (b) Calculate the sample mean with its theoretical value, using the definition of E(X) in Chapter 6. 18.7 (a) Use the method in Table 18.5 to create 100 exponential random numbers with mean waiting time 1/λ= 0.40. Discuss the characteristics of the resulting sample (minimum, maximum, mode, etc.). (b) Why did the simulation of product demand in this chapter (see Tables 18.3 and 18.4) round the exponential values to integers? 18.8 Use the freezer simulation in LearningStats Unit 18 McGraw-Hill Connect® downloads to observe temperature samples. (a) Press the F9 key 5 times. Did you get any sample means above the UCL doa57594_web_ch18_001-020_onlinecontent.indd 18-17 10/09/17 03:49 PM Final PDF to printer Chapter 18 Simulation 18-18 or below the LCL? Repeat. (b) How did Monte Carlo simulation help demonstrate control charts for a mean? Freezer 18.9 In LearningStats Unit 8 McGraw-Hill Connect® downloads, use the bootstrap simulation for a mean. (a) Select a normal population. Observe the histogram of the 20 sample items as you press F9 10 times. Are the samples consistent with the specified population shape? Press F9 10 more times, observing the confidence intervals. Are they similar? (b) Repeat the previous exercise, using a uniform population. (c) Repeat the previous exercise, using a skewed population. (d) Repeat the previous exercise, using a strongly skewed population. (e) How did simulation help demonstrate the bootstrap concept? Note: The mean is 50 for each distribution. Bootstrap 18.10 In LearningStats Unit 18 McGraw-Hill Connect® downloads, choose one of the three simulation projects. The three scenarios use only the normal N(μ, σ) and triangular T(a, b, c) distributions because they are flexible yet easy to understand. Each scenario involves a problem faced by a hypothetical character named Bob. If you have access to @Risk, use the @Risk version. Otherwise, use the Excel-only version. Use the 100-iteration worksheet to answer the questions posed below: a. Scenario 1: Bob’s Stochastic Balance Sheet. Questions: (i) How often does Bob’s expected net worth exceed $10,000? (ii) What is Bob’s expected net worth? (iii) On a given day, is there a 50 percent chance that Bob’s net worth exceeds $10,000? (iv) What are the 25 percent and 75 percent points of his daily net worth? The 5 percent and 95 percent points? (v) Verify that the stock prices have the desired means and standard deviations. (vi) Verify that the Mustang and beer can means are equal to (Min + Max + Mode)/3. b. Scenario 2: Bob’s Mail-Order Business. Questions: (i) What is Bob’s expected profit? (ii) What is Bob’s median profit? (iii) Estimate the 5 percent and 95 percent outcomes and interpret them. (iv) Estimate the quartiles of Bob’s profit and interpret them. (v) Why might Bob undertake this business venture? Why might he not? Explain. (vi) Compare the mean of each input variable with its expected value. The expected value of a triangular variable is (Min + Max + Mode)/3. (vii) If Bob had enough capital to mail twice as many flyers, would it change the outcome? c. Scenario 3: Bob’s Statistics Grades. Questions: (i) What do the parameters say about Bob’s selfevaluation? (ii) What is Bob’s expected grade? (iii) What is the chance that Bob’s overall grade will be below 70? What is his chance of exceeding 80? (iv) Estimate and interpret the quartile points for his overall grade. (v) Estimate and interpret the 5 percent and 95 percent points for his overall grade. (vi) From the histogram, what grade range is most likely? (vii) To check the simulation, inspect the mean and standard deviation of each input variable. Are they about what they are supposed to be? MONTE CARLO SIMULATION PROJECT 18.11 Objective: To demonstrate that an expected value E(X) is an average, and that there is variation around the average. Scenario: A life insurance company charges $1,500 for a $100,000 one-year term life insurance policy for a 60-year-old white male. If the insured lives, the company gains $1,500. If the insured dies, the company loses $98,500 (the $100,000 face value of the policy minus the $1,500 prepaid premium). The probability of the insured’s death during the year is .012. Instructions: (a) Calculate the company’s expected payout ($100,000 with probability .012, $0 with probability .988). (b) Calculate the expected net profit by subtracting the expected payout from $1,500. (c) To perform a Monte Carlo simulation of net profit for 1,000 insurance policies, enter the Excel formula =IF(RAND()< 0.012,–98500,1500)) into cell A1 and then copy the formula into cells A1:A1000. (d) To get the simulated net profit, in cell C1 enter the formula =AVERAGE(A1:A1000). (e) Press F9 10 times, each time writing down the average in cell C1. (f) Was the average net profit close to the expected net payout from part (b)? (g) To count the number of times the company had to pay, enter =COUNTIF(A1:A1000,”=-98500”) in cell C2. (h) Press F9 10 times and write down how many times the company had to pay. Bottom Line Questions: How much variability is there in the number of claims paid and in the net profit for 1,000 policies? Why is the expected value an incomplete description of net profit? Why does an insurance company need to issue lots of insurance policies in order to have stable profits? Would 1,000 policies be enough? doa57594_web_ch18_001-020_onlinecontent.indd 18-18 10/09/17 03:49 PM Final PDF to printer 18-19 Applied Statistics in Business and Economics Useful Websites @Risk Simio XLSim YASAI RELATED READING www.palisade.com www.simio.com/index.html http://xlsim.software.informer.com/ www.yasai.rutgers.edu/ Banks, Jerry; John Carlson; Barry L. Nelson; and David Nicol. Discrete Event System Simulation. 5th ed. Pearson, 2010. Chernick, Michael R., and Robert A. LaBudde. An Introduction to Bootstrap Methods with Applications to R. Wiley, 2012. Conway, Richard W., and John O. McCain. “The Conduct of an Effective Simulation Study.” INFORMS Transactions on Education 3, no. 3 (May 2003), pp. 13–22. Gentle, James E. Random Number Generation and Monte Carlo Methods. Springer-Verlag, 2003. Kelton, W. David; Randall P. Sadowski; and Nancy B. Swets. Simulation with Arena. 6th ed. McGraw-Hill, 2015. McLeish, Don L. Monte Carlo Simulation and Finance. Wiley, 2005. Robert, Christian P., and George Casella. Monte Carlo Statistical Methods. 2nd ed. Springer-Verlag, 2004. Rossetti, Manuel D. Simulation Modeling and Arena, 2nd ed. Wiley, 2015. Rubinstein, Reuven Y., and Dirk P. Kroese. Simulation and the Monte Carlo Method. 3rd ed. Wiley, 2016. Vose, David. Risk Analysis: A Quantitative Guide. 3rd ed. Wiley, 2007. doa57594_web_ch18_001-020_onlinecontent.indd 18-19 10/09/17 03:49 PM Final PDF to printer Chapter 18 Simulation 18-20 CHAPTER 18 More Learning Resources You can access these LearningStats demonstrations through McGraw-Hill’s Connect® to help you understand simulation. Topic LearningStats Demonstrations Overview Overview of Simulation Using Excel How to Create Random Data Random Normal Data Explained Examples Axolotl’s Three Products Freezer Temperature Control Chart Your Annual Fuel Cost Bootstrap Simulation Excel projects Project Instructions Bob’s Balance Sheet (Excel) Bob’s New Business (Excel) Bob’s Statistics Grades (Excel) @Risk projects Bob’s Balance Sheet (@Risk) Bob’s New Business (@Risk) Bob’s Statistics Grades (@Risk) Key: = PowerPoint = Word doa57594_web_ch18_001-020_onlinecontent.indd = Excel 18-20 10/09/17 03:49 PM