Comparison of Discrete Event Simulation and Risk-Analytic Models of the Fire Risk to a Manufacturing System Michael E. G. Schmidt DSES-6620 Fall 2002 Michael E. G. Schmidt SES-6620 Fall 2002 Page 2 of 10 Executive summary The original project proposal suggested using a study done for GE Lighting, suitably anonymized, to determine whether a discrete event simulation (DES) would produce the same results as the risk-analytical (RA) approach that was originally used. Although in hindsight it should have been obvious that they must produce similar if not identical results, the demonstration was surprisingly difficult because of certain characteristics of the DES method and limitations of the PROMODEL student version. The demonstration also uncovered an unexpectedly large conservatism in the RA approach and (although it was not pursued it to its conclusion) the power of inventory in risk management. This report describes: How and why the original model was simplified to fit the DES environment provided by PROMODEL student, The original RA model and the simplified RA model that was constructed to match the eventual DES model, The results obtained and the further adjustments to the DES model and The conclusions drawn. I. Introduction A. Definition of objective; scope and requirements The basic plan was to compare the "digital rat in a maze" (DES) approach to previously employed RA modeling with the Monte Carlo method. PROMODAL represents the former and @Risk in an Excel spreadsheet the latter. Risk is the product of probability and consequence. The biggest problem with risk modeling is the lack of robust data, particularly for low probability/high consequence events. (For example, there have been only 2-1/2 so-called Maximum Foreseeable Loss (MFL) events in the history of the automobile manufacturing industry.) For a professional exercise, consequence distributions are fairly accurately calculable even for serious events, however, probability distributions would probably be elicited from subject matter experts with or without a preliminary “project management” breakdown of the event of interest into more manageable (or predictable) sub elements. For this project, I propose instead to use generic data or estimates. (I am, in fact, a subject matter expert, so this is not that much of a reach.) Time permitting, I will explore the sensitivity of the results to the input distribution shapes and means. I also suggest that the actual input data will not be critical because my interest is in a comparison of methods rather than the actual result. The intended project would have employed data that were developed in a preliminary study of a consumer product manufacturing network that consists of a group of assembly plants that are supported by a network of component and subassembly facilities. Figure 1 is an approximate production flow chart for manufacture of linear fluorescent lighting products at a time in the early 1990’s. The hazard under study was fire, and the original deliverable was a comparison of the expected risk distributions “as is” and with a risk management strategy implemented. Because the original study was a demonstration project, only the assembly plant in the upper left was modeled along with its support facilities. “Risk Exceedance” curves (cumulative distribution plots) are shown in figure 2. Michael E. G. Schmidt SES-6620 Fall 2002 Page 3 of 10 As the project progressed it quickly became apparent that even the single-assembly plant model was far too complex to be supported. The scope of the project was therefore changed to building equivalent models and comparing them, rather than attempting to independently model a reality. (This simplified the project from a practical viewpoint because problems with proprietary information handling were thereby eliminated.) B. Collection and analysis Collection and analysis was intended to consist of the assembled data and expert-elicited estimates from the original project. The altered project scope used a subset of that data applied to two independent models. Because a subset of data was used, no external reality was actually being modeled, however, realistic values were, for the most part, employed. C. Model building The RA model was pre-existing, however, problems outlined below with the DES model made necessary construction of a simplified RA model. Results of exercising the model drove further modifications to the DES model as described later. D. Model verification and validation Original intent was that the DES model would be verified and validated to the actual loss history that was used to validate the original study, however, the model simplification that was required made this impossible. Therefore, the only verification and validation is internal, i.e., showing equivalence between models. E. Experimentation with the model. Progress through various simplified and otherwise refined RA and DES models is described. Again, the objective to separately model an independent reality had to be abandoned in favor of equivalently modeling a construct. F. Reporting The process was the product as the following description demonstrates. Areas for further study are noted. II. PROMODEL approach A. Obvious limitations 1. Complexity of RA model with stochastic direct and business interruption affects and stochastic probabilities for each of three event categories. The RA model recognized three “orders” of risk, known as the normal loss expectancy (NLE), probable maximum loss (PML) and maximum foreseeable loss (MFL). These range from high probability/low consequence to low probability/high consequence. For fire hazards, these generally correspond to fire confined to its area of origin (generally by sprinklers) (NLE), worst expected fire under “reasonably adverse” conditions (generally the most important sprinkler system “impaired”) (MFL), and the worst expected fire with all protection out of service (MFL). Probability distributions existed for NLE, PML and MFL probabilities and consequences, with consequences separated between direct (property) damage and business interruption (lost time). Michael E. G. Schmidt SES-6620 Fall 2002 Page 4 of 10 2. Five entity limit in PROMODEL; seven components plus finished product in RA model. 3. Event probabilities on the order of 10EE-5/yr. The RA model is designed to simulate one year of experience per calculation of the spreadsheet. This allows thousands of years to be simulated very quickly. PROMODEL needs to operate day by day. This makes a run long enough to “ensure several occurrences of (even) rare events” (as recommended by the PROMODEL manual) was a very impractical proposition. B. "Discovered" limitations 1. Apparent limit of 5 downtimes in one model. Although it does not seem to fit any of the limits in the warning screen, nor did it trigger any warning messages (e.g. as appear when a fifth entity is defined), however, more than 5 downtimes seemed to be ignored. (As noted later, fewer downtimes could be overridden if the model was not carefully constructed.) 2. Apparent inability to have 2 primary routing rules for an entity. This is a real limitation per the reference manual. It is possible that a work-around is possible, but never discovered. The minimal purposes of the project were accomplished by other means. C. Model simplification 1. Grouped entities with single sources and similar downtimes as single entities. 2. Applied multiple downtimes to various Locations to simulate the existence of separate locations. The only obvious way to simulate fires at facilities was to use downtimes corresponding to the frequency distributions and consequence distributions for the various classes of fire. The first attempt to duplicate the RA model employed grouped entities with duplicate probabilities into the same Plant locations and assigned two downtimes with the different outage durations to each. Two downtimes were also assigned to the Assembly facility to simulate the two singlesource entities that could not otherwise be modeled. 3. Considered only business interruption, i.e., lost days of production, as the consequence This is not a wholly invalid assumption; business interruption losses are generally an order of magnitude higher than direct damage losses for the same event. 4. Dropped all second and third order (PML and MFL) events. Sensitivity studies on the original RA model revealed that these events contributed very little to the expected value of risk. An attempt was made to include them by increasing all the event frequencies by 2 orders of magnitude; unfortunately, the downtimes for the first order events were long enough that these events tended to overlap, which is logically impossible. For this and the next-noted difficulty, only first order consequences (NLE) were considered. 5. Found that more than five downtimes seemed to be ignored. Michael E. G. Schmidt SES-6620 Fall 2002 Page 5 of 10 This ended the attempt to simulate more than five entities and more than the NLE event for each entity manufacturing plant. The final model defined as entities Product, PartA, PartB, PartC and PartD, the parts being sourced at Plant1 through Plant5 and assembled at Assembly. 6. Parts accumulating in the system apparently overrode downtime at Assembly One final problem occurred; downtime at the Assembly location had no effect because no obvious way existed to discard entities that were blocked. Apparently, blocked entities at a Plant accumulated and flowed to Assembly during other Plant downtimes. (Lean manufacturing would tend not to produce other parts when any critical part in an assembly is not being produced.) D. Initial results 1. Extended run results. After working the obvious bugs out of the DES model and making most of the previouslydescribed simplifications, an initial run at 24000 hr provided 937 units of output. This run time is approximately 2.75 yr, which seemed sufficient given that the longest downtime was about 1/3 yr. This is file t1.txt. To test for convergence, a run at 2,400,000 hr, or approximately 275 yr. This produced a significantly lower output of 89,133 units, equivalent to 891 unit/24000 hr. At this setting the run required an inconvenient length of time, however. This run is t1a.txt. 2. Multiple run results. Further experimentation revealed that ten reps at 240,000 hr ran significantly faster than the 2.4 M hr and produced an average output of 9,065 units, equivalent to 906 units/24000 hr. This was judged close enough to convergence in an acceptable run time. This outage level is equivalent to approximately 34 days/yr. III. RA approach and model A. Brief description of principles and the original model The inputs to the original RA model were described in section II. A. The inputs were fed through a data table into a Monte Carlo simulator called @Risk. The principle is that the uncertainty surrounding fire events is large, both in probability and consequence. An engineering approach would suggest application of the worst case for each scenario. However, the worst case is extremely unlikely; therefore the Monte Carlo simulator randomly selects probability-weighted data sets and calculates the outcomes that each data set produces. Therefore, the model provides more probable small fires and less probable large fires each “year,” however, some contribution for every class of event is made every year. The result is an expected value of loss, or risk, for each modeled year. The result needs to be interpreted carefully; a $10,000,000 loss is not managed in the same way as ten $1,000,000 loss, however (neglecting the time value of money and other financial effects) both events produce a $10,000,000 ten year expected value. The RA model, unlike the DES model, will never “show” an actual MFL event – but the MFL event will show up in the Expected values if the probability and consequences are large enough. B. Simplified model (as required by PROMODEL problems) Figures 3 and 4 are the inputs that were used in the simplified DES and RA models. Figure 5 is the simplified RA model, with the understanding that the model requires the @Risk engine to fit inputs to the distributions and calculate the outputs. Michael E. G. Schmidt SES-6620 Fall 2002 Page 6 of 10 C. RA model results Figure 6 is the cumulative distributions for 10,000 “years.” Note that the base case outage level of 67 day/yr is tantalizingly close exactly twice the 34 day/yr that the DES model was producing at this stage. This resulted in several fruitless hours searching for a simple conversion or mathematical error. IV. Analysis of difference and rework DES showed significantly fewer lost days than RA. See figure 8, which is based upon t1a10.txt. Study of the DES model revealed that the downtimes had to appear in the order of increasing frequency; otherwise the less frequent downtimes were "covered" by the more frequent downtimes, apparently by the previously noted accumulation of entities. This helped, but still produced at least a factor of 2 fewer lost days – in fact 19 day/yr, which is very seriously the wrong way. See figure 9, which is based upon t2a10.txt. It was then realized that the construction of the RA model did not allow simultaneous events at more than one facility but the DES model did. It had previously been assumed that simultaneous events were rare enough to ignore; this exercise showed that this may be an overly conservative assumption, at least with the (admittedly rather high) event probabilities that were used in this simulation. To correct this condition, all the downtimes were assigned to the assembly facility and each downtime was assigned a different pre-emptive priority. This provided results, 51 day/yr, that very closely matched the RA results. See figure 10, which is based upon t3a10.txt. It also seemed reasonable to wonder whether the 10,000 yr runs, being so efficiently executed, were an unwise approach because of the random number considerations that were discussed in this course. Therefore, ten 1000 yr replications were conducted. The cumulative distributions for the replications with the highest and lowest means are shown in figure 7. V. Conclusions and comments 1. The assumption that simultaneous events are too improbable to consider needs to be revisited. 2. The reason for the extremely “tight” output produced by the RA needs to be determined. This may simply be a matter of the smoothing effect of the expected value calculation or it could be an artifact of the efficiently-produced long simulation periods – or not. 3. Both the RA model and any future DES work would benefit from more rigorous use of statistics. Heretofore, validation has been very simple because the historic data could be quite exactly fit by the RA model. Statistical validation in terms of confidence intervals might facilitate buy-in. 4. It would still be desirable to find a way to detect the “lumpiness” in an expected loss profile short of using a DES model. It would appear that analysis of the probabilities of various event classes might provide insight. Michael E. G. Schmidt SES-6620 Fall 2002 Page 7 of 10 Linear Fluorescent Delivery North America Bucyrus Assembly Oakville Assembly Circleville Assembly Tubing Logan Linear Fluorescent Delivery South America Brazil Assembly Chile Assembly Enf ield Assembly India 43 Assembly Hua Mei China Assembly Wire Tungsten Prods Base/Pins Conneaut Linear Fluorescent Delivery Asia VAC Assembly Monterey Assembly Ext. Tubing Bridgeville Base/Pins Acuna Linear Fluorescent Delivery Europe Wire Hajduboszormeny Cathodes Hajduboszormeny Tubing Brazil Made Locally Lead Wire Carolina Wire Hua Mei China Source Locally Wire India 43 Source Locally Coating Ivanhoe Cathodes Brazil Tubing Chile Made Locally Wire India 44 Source Locally India 44 Assembly Philippines Assembly Ext. Tubing VAC Wire Circleville Base/Pins Zala Tubing Enf ield Sourced Locally Indonesia Assembly Cathodes Circleville Lead Wire Hajduboszormeny Wire Indonesia Source Locally Tubing VAC Made Locally Ext. Tubing Hua Mei China Source Locally Tubing Hua Mei China Made Locally Ext. Tubing India 43 Source Locally Tubing India 43 Made Locally Lead Wire Made Locally Tubing India 44 Made Locally Ext. Tubing India 44 Source Locally Ext. Tubing Indonesia Source Locally Tubing Indonesia Source Locally Lead Wire Zala Tubing Philippines Source Locally Figure 1 Figure 2 Risk Exceedance Curves 1.00 0.90 0.80 Probability 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 $1,000,000 $10,000,000 $100,000,000 Loss Alternative Case Risk Base Case Risk $1,000,000,000 Michael E. G. Schmidt SES-6620 Fall 2002 Page 8 of 10 Figure 3 baseNLE altNLE 0.2000 0.5000 0.5000 0.2500 0.1000 0.2000 0.4000 0.1000 Base DT NLE in Days Min Mode Max 8 10 12 10 10 30 78 97.5 117 0 1 90 Alt DT NLE in Days Min Mode Max 8 10 12 10 10 30 78 97.5 117 0 1 10 Figure 4 Alt Base Min Plant1 Plant2 Plant3 Plant4 Plant1 Plant2 Plant3 Plant4 Annual Probability Mode Max 0.1 0.2 0.3 0.25 0.5 0.75 0.25 0.5 0.75 0.125 0.25 0.375 0.05 0.1 0.15 0.1 0.2 0.3 0.2 0.4 0.6 0.05 0.1 0.15 Daily Probability "Frequency" (days) Min Mode Max Min Mode Max 0.000274 0.000548 0.000822 1217 1825 0.000685 0.00137 0.002055 487 730 0.000685 0.00137 0.002055 487 730 0.000342 0.000685 0.001027 973 1460 0.000137 0.000274 0.000411 2433 3650 0.000274 0.000548 0.000822 1217 1825 0.000548 0.001096 0.001644 608 913 0.000137 0.000274 0.000411 2433 3650 3650 1460 1460 2920 7300 3650 1825 7300 Figure 5 Base Probability Alt Plant1 Plant2 Plant3 =RiskTriang(0.1,0.2,0.3) =RiskTriang(0.25,0.5,0.75) =RiskTriang(0.25,0.5,0.75) Plant4 =RiskTriang(0.125,0.25,0.375 ) Base NLE Days Alt =RiskTriang(0.05,0.1,0.15) =RiskTriang(8,10,12) =RiskTriang(0.1,0.2,0.3) =RiskTriang(10,10,30) =RiskTriang(0.2,0.4,0.6) =RiskTriang(78,97.5,117 ) =RiskTriang(0.05,0.1,0.15) =RiskTriang(0,1,90) Base Risk Alt =RiskTriang(8,10,12) =RiskTriang(10,10,30) =RiskTriang(78,97.5,117 )=RiskTriang(0,1,10) =Prob*Conseq =Prob*Conseq =Prob*Conseq =Prob*Conseq =Prob*Conseq =Prob*Conseq Total = SUM =Prob*Conseq =Prob*Conseq =SUM Michael E. G. Schmidt SES-6620 Fall 2002 Page 9 of 10 Figure 6 Distribution for Total / Alt/O28 O28: X <= 29.73 5% 1 0.9 O28: X <= 58.21 95% O28: Mean=43.68603 0.8 0.7 N28: Mean=66.63783 0.6 0.5 0.4 0.3 0.2 0.1 0 20 50 80 110 Figure 7 1 0.9 0.8 0.7 0.6 Series1 0.5 Series2 0.4 0.3 0.2 0.1 0 30 40 50 60 70 80 90 100 110 120 Michael E. G. Schmidt SES-6620 Fall 2002 Page 10 of 10 Figure 8 Rep 1) 2) 3) 4) 5) 6) 7) 8) 9) 10) Variable Value DT Base out 9033 203 out 9027 952 out 9032 967 out 8792 968 out 9048 973 out 8927 980 out 9020 1002 out 9797 1017 out 8998 1073 out 8983 1208 Mean /yr DT Alt /yr 7 761 28 35 763 28 35 769 28 35 781 28 35 795 29 36 808 29 36 845 31 37 861 31 39 908 33 44 995 36 34 30 Figure 9 Rep 1) 2) 3) 4) 5) 6) 7) 8) 9) 10) Variable out out out out out out out out out out Value DT Base /yr 9036 103 8768 166 9897 166 9825 175 9040 185 9032 218 9834 960 9782 964 9834 968 9815 1232 Mean 4 6 6 6 7 8 35 35 35 45 19 Figure 10 Rep 1) 2) 3) 4) 5) 6) 7) 8) 9) 10) Variable out out out out out out out out out out Value DT Base /yr 8727 1273 8529 1303 8592 1304 8541 1341 8659 1391 8696 1408 8697 1443 8609 1459 8557 1471 8500 1500 Mean 46 47 47 49 51 51 52 53 53 55 51