Comparison of Discrete Event Simulation and Risk-Analytic

advertisement
Comparison of
Discrete Event
Simulation and
Risk-Analytic
Models of the Fire
Risk to a
Manufacturing
System
Michael E. G. Schmidt
DSES-6620
Fall 2002
Michael E. G. Schmidt
SES-6620
Fall 2002
Page 2 of 10
Executive summary
The original project proposal suggested using a study done for GE Lighting, suitably anonymized,
to determine whether a discrete event simulation (DES) would produce the same results as the
risk-analytical (RA) approach that was originally used.
Although in hindsight it should have been obvious that they must produce similar if not identical
results, the demonstration was surprisingly difficult because of certain characteristics of the DES
method and limitations of the PROMODEL student version. The demonstration also uncovered an
unexpectedly large conservatism in the RA approach and (although it was not pursued it to its
conclusion) the power of inventory in risk management.
This report describes:
 How and why the original model was simplified to fit the DES environment provided by
PROMODEL student,
 The original RA model and the simplified RA model that was constructed to match the
eventual DES model,
 The results obtained and the further adjustments to the DES model and
 The conclusions drawn.
I.
Introduction
A.
Definition of objective; scope and requirements
The basic plan was to compare the "digital rat in a maze" (DES) approach to previously employed
RA modeling with the Monte Carlo method. PROMODAL represents the former and @Risk in an
Excel spreadsheet the latter.
Risk is the product of probability and consequence. The biggest problem with risk modeling is the
lack of robust data, particularly for low probability/high consequence events. (For example, there
have been only 2-1/2 so-called Maximum Foreseeable Loss (MFL) events in the history of the
automobile manufacturing industry.) For a professional exercise, consequence distributions are
fairly accurately calculable even for serious events, however, probability distributions would
probably be elicited from subject matter experts with or without a preliminary “project
management” breakdown of the event of interest into more manageable (or predictable) sub
elements.
For this project, I propose instead to use generic data or estimates. (I am, in fact, a subject matter
expert, so this is not that much of a reach.) Time permitting, I will explore the sensitivity of the
results to the input distribution shapes and means. I also suggest that the actual input data will
not be critical because my interest is in a comparison of methods rather than the actual result.
The intended project would have employed data that were developed in a preliminary study of a
consumer product manufacturing network that consists of a group of assembly plants that are
supported by a network of component and subassembly facilities. Figure 1 is an approximate
production flow chart for manufacture of linear fluorescent lighting products at a time in the early
1990’s. The hazard under study was fire, and the original deliverable was a comparison of the
expected risk distributions “as is” and with a risk management strategy implemented. Because
the original study was a demonstration project, only the assembly plant in the upper left was
modeled along with its support facilities. “Risk Exceedance” curves (cumulative distribution plots)
are shown in figure 2.
Michael E. G. Schmidt
SES-6620
Fall 2002
Page 3 of 10
As the project progressed it quickly became apparent that even the single-assembly plant model
was far too complex to be supported. The scope of the project was therefore changed to building
equivalent models and comparing them, rather than attempting to independently model a reality.
(This simplified the project from a practical viewpoint because problems with proprietary
information handling were thereby eliminated.)
B.
Collection and analysis
Collection and analysis was intended to consist of the assembled data and expert-elicited
estimates from the original project. The altered project scope used a subset of that data applied
to two independent models. Because a subset of data was used, no external reality was actually
being modeled, however, realistic values were, for the most part, employed.
C.
Model building
The RA model was pre-existing, however, problems outlined below with the DES model made
necessary construction of a simplified RA model. Results of exercising the model drove further
modifications to the DES model as described later.
D.
Model verification and validation
Original intent was that the DES model would be verified and validated to the actual loss history
that was used to validate the original study, however, the model simplification that was required
made this impossible. Therefore, the only verification and validation is internal, i.e., showing
equivalence between models.
E.
Experimentation with the model.
Progress through various simplified and otherwise refined RA and DES models is described.
Again, the objective to separately model an independent reality had to be abandoned in favor of
equivalently modeling a construct.
F.
Reporting
The process was the product as the following description demonstrates. Areas for further study
are noted.
II.
PROMODEL approach
A. Obvious limitations
1. Complexity of RA model with stochastic direct and business interruption affects and
stochastic probabilities for each of three event categories.
The RA model recognized three “orders” of risk, known as the normal loss expectancy (NLE),
probable maximum loss (PML) and maximum foreseeable loss (MFL). These range from high
probability/low consequence to low probability/high consequence. For fire hazards, these
generally correspond to fire confined to its area of origin (generally by sprinklers) (NLE),
worst expected fire under “reasonably adverse” conditions (generally the most important
sprinkler system “impaired”) (MFL), and the worst expected fire with all protection out of
service (MFL). Probability distributions existed for NLE, PML and MFL probabilities and
consequences, with consequences separated between direct (property) damage and
business interruption (lost time).
Michael E. G. Schmidt
SES-6620
Fall 2002
Page 4 of 10
2. Five entity limit in PROMODEL; seven components plus finished product in RA model.
3. Event probabilities on the order of 10EE-5/yr.
The RA model is designed to simulate one year of experience per calculation of the
spreadsheet. This allows thousands of years to be simulated very quickly. PROMODEL
needs to operate day by day. This makes a run long enough to “ensure several occurrences
of (even) rare events” (as recommended by the PROMODEL manual) was a very impractical
proposition.
B. "Discovered" limitations
1. Apparent limit of 5 downtimes in one model.
Although it does not seem to fit any of the limits in the warning screen, nor did it trigger any
warning messages (e.g. as appear when a fifth entity is defined), however, more than 5
downtimes seemed to be ignored. (As noted later, fewer downtimes could be overridden if the
model was not carefully constructed.)
2. Apparent inability to have 2 primary routing rules for an entity.
This is a real limitation per the reference manual. It is possible that a work-around is possible,
but never discovered. The minimal purposes of the project were accomplished by other
means.
C. Model simplification
1. Grouped entities with single sources and similar downtimes as single entities.
2. Applied multiple downtimes to various Locations to simulate the existence of separate
locations.
The only obvious way to simulate fires at facilities was to use downtimes corresponding to the
frequency distributions and consequence distributions for the various classes of fire. The first
attempt to duplicate the RA model employed grouped entities with duplicate probabilities into
the same Plant locations and assigned two downtimes with the different outage durations to
each. Two downtimes were also assigned to the Assembly facility to simulate the two singlesource entities that could not otherwise be modeled.
3. Considered only business interruption, i.e., lost days of production, as the consequence
This is not a wholly invalid assumption; business interruption losses are generally an order of
magnitude higher than direct damage losses for the same event.
4. Dropped all second and third order (PML and MFL) events.
Sensitivity studies on the original RA model revealed that these events contributed very little
to the expected value of risk. An attempt was made to include them by increasing all the
event frequencies by 2 orders of magnitude; unfortunately, the downtimes for the first order
events were long enough that these events tended to overlap, which is logically impossible.
For this and the next-noted difficulty, only first order consequences (NLE) were considered.
5. Found that more than five downtimes seemed to be ignored.
Michael E. G. Schmidt
SES-6620
Fall 2002
Page 5 of 10
This ended the attempt to simulate more than five entities and more than the NLE event for
each entity manufacturing plant. The final model defined as entities Product, PartA, PartB,
PartC and PartD, the parts being sourced at Plant1 through Plant5 and assembled at
Assembly.
6. Parts accumulating in the system apparently overrode downtime at Assembly
One final problem occurred; downtime at the Assembly location had no effect because no
obvious way existed to discard entities that were blocked. Apparently, blocked entities at a
Plant accumulated and flowed to Assembly during other Plant downtimes. (Lean
manufacturing would tend not to produce other parts when any critical part in an assembly is
not being produced.)
D. Initial results
1. Extended run results.
After working the obvious bugs out of the DES model and making most of the previouslydescribed simplifications, an initial run at 24000 hr provided 937 units of output. This run time
is approximately 2.75 yr, which seemed sufficient given that the longest downtime was about
1/3 yr. This is file t1.txt. To test for convergence, a run at 2,400,000 hr, or approximately 275
yr. This produced a significantly lower output of 89,133 units, equivalent to 891 unit/24000 hr.
At this setting the run required an inconvenient length of time, however. This run is t1a.txt.
2. Multiple run results.
Further experimentation revealed that ten reps at 240,000 hr ran significantly faster than the
2.4 M hr and produced an average output of 9,065 units, equivalent to 906 units/24000 hr.
This was judged close enough to convergence in an acceptable run time. This outage level is
equivalent to approximately 34 days/yr.
III.
RA approach and model
A. Brief description of principles and the original model
The inputs to the original RA model were described in section II. A. The inputs were fed through a
data table into a Monte Carlo simulator called @Risk. The principle is that the uncertainty
surrounding fire events is large, both in probability and consequence. An engineering approach
would suggest application of the worst case for each scenario. However, the worst case is
extremely unlikely; therefore the Monte Carlo simulator randomly selects probability-weighted
data sets and calculates the outcomes that each data set produces. Therefore, the model
provides more probable small fires and less probable large fires each “year,” however, some
contribution for every class of event is made every year. The result is an expected value of loss,
or risk, for each modeled year. The result needs to be interpreted carefully; a $10,000,000 loss is
not managed in the same way as ten $1,000,000 loss, however (neglecting the time value of
money and other financial effects) both events produce a $10,000,000 ten year expected value.
The RA model, unlike the DES model, will never “show” an actual MFL event – but the MFL event
will show up in the Expected values if the probability and consequences are large enough.
B. Simplified model (as required by PROMODEL problems)
Figures 3 and 4 are the inputs that were used in the simplified DES and RA models. Figure 5 is
the simplified RA model, with the understanding that the model requires the @Risk engine to fit
inputs to the distributions and calculate the outputs.
Michael E. G. Schmidt
SES-6620
Fall 2002
Page 6 of 10
C. RA model results
Figure 6 is the cumulative distributions for 10,000 “years.” Note that the base case outage level of
67 day/yr is tantalizingly close exactly twice the 34 day/yr that the DES model was producing at
this stage. This resulted in several fruitless hours searching for a simple conversion or
mathematical error.
IV.
Analysis of difference and rework
DES showed significantly fewer lost days than RA. See figure 8, which is based upon t1a10.txt.
Study of the DES model revealed that the downtimes had to appear in the order of increasing
frequency; otherwise the less frequent downtimes were "covered" by the more frequent
downtimes, apparently by the previously noted accumulation of entities. This helped, but still
produced at least a factor of 2 fewer lost days – in fact 19 day/yr, which is very seriously the
wrong way. See figure 9, which is based upon t2a10.txt. It was then realized that the construction
of the RA model did not allow simultaneous events at more than one facility but the DES model
did. It had previously been assumed that simultaneous events were rare enough to ignore; this
exercise showed that this may be an overly conservative assumption, at least with the (admittedly
rather high) event probabilities that were used in this simulation.
To correct this condition, all the downtimes were assigned to the assembly facility and each
downtime was assigned a different pre-emptive priority. This provided results, 51 day/yr, that very
closely matched the RA results. See figure 10, which is based upon t3a10.txt.
It also seemed reasonable to wonder whether the 10,000 yr runs, being so efficiently executed,
were an unwise approach because of the random number considerations that were discussed in
this course. Therefore, ten 1000 yr replications were conducted. The cumulative distributions for
the replications with the highest and lowest means are shown in figure 7.
V.
Conclusions and comments
1. The assumption that simultaneous events are too improbable to consider needs to be
revisited.
2. The reason for the extremely “tight” output produced by the RA needs to be determined. This
may simply be a matter of the smoothing effect of the expected value calculation or it could
be an artifact of the efficiently-produced long simulation periods – or not.
3. Both the RA model and any future DES work would benefit from more rigorous use of
statistics. Heretofore, validation has been very simple because the historic data could be
quite exactly fit by the RA model. Statistical validation in terms of confidence intervals might
facilitate buy-in.
4. It would still be desirable to find a way to detect the “lumpiness” in an expected loss profile
short of using a DES model. It would appear that analysis of the probabilities of various event
classes might provide insight.
Michael E. G. Schmidt
SES-6620
Fall 2002
Page 7 of 10
Linear Fluorescent
Delivery
North America
Bucyrus
Assembly
Oakville
Assembly
Circleville
Assembly
Tubing
Logan
Linear Fluorescent
Delivery
South America
Brazil
Assembly
Chile
Assembly
Enf ield
Assembly
India 43
Assembly
Hua Mei China
Assembly
Wire
Tungsten
Prods
Base/Pins
Conneaut
Linear Fluorescent
Delivery
Asia
VAC
Assembly
Monterey
Assembly
Ext. Tubing
Bridgeville
Base/Pins
Acuna
Linear Fluorescent
Delivery
Europe
Wire
Hajduboszormeny
Cathodes
Hajduboszormeny
Tubing
Brazil
Made
Locally
Lead Wire
Carolina
Wire
Hua Mei
China
Source
Locally
Wire
India 43
Source
Locally
Coating
Ivanhoe
Cathodes
Brazil
Tubing
Chile
Made
Locally
Wire
India 44
Source
Locally
India 44
Assembly
Philippines
Assembly
Ext.
Tubing
VAC
Wire
Circleville
Base/Pins
Zala
Tubing
Enf ield
Sourced
Locally
Indonesia
Assembly
Cathodes
Circleville
Lead Wire
Hajduboszormeny
Wire
Indonesia
Source
Locally
Tubing
VAC
Made
Locally
Ext. Tubing
Hua Mei
China
Source
Locally
Tubing
Hua Mei
China
Made
Locally
Ext. Tubing
India 43
Source
Locally
Tubing
India 43
Made
Locally
Lead
Wire
Made
Locally
Tubing
India 44
Made
Locally
Ext.
Tubing
India 44
Source
Locally
Ext.
Tubing
Indonesia
Source
Locally
Tubing
Indonesia
Source
Locally
Lead
Wire
Zala
Tubing
Philippines
Source
Locally
Figure 1
Figure 2
Risk Exceedance Curves
1.00
0.90
0.80
Probability
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
$1,000,000
$10,000,000
$100,000,000
Loss
Alternative Case Risk
Base Case Risk
$1,000,000,000
Michael E. G. Schmidt
SES-6620
Fall 2002
Page 8 of 10
Figure 3
baseNLE altNLE
0.2000
0.5000
0.5000
0.2500
0.1000
0.2000
0.4000
0.1000
Base DT NLE in Days
Min
Mode
Max
8
10
12
10
10
30
78
97.5
117
0
1
90
Alt DT NLE in Days
Min
Mode
Max
8
10
12
10
10
30
78
97.5
117
0
1
10
Figure 4
Alt
Base
Min
Plant1
Plant2
Plant3
Plant4
Plant1
Plant2
Plant3
Plant4
Annual Probability
Mode
Max
0.1
0.2
0.3
0.25
0.5
0.75
0.25
0.5
0.75
0.125
0.25
0.375
0.05
0.1
0.15
0.1
0.2
0.3
0.2
0.4
0.6
0.05
0.1
0.15
Daily Probability
"Frequency" (days)
Min
Mode
Max
Min
Mode
Max
0.000274 0.000548 0.000822
1217
1825
0.000685 0.00137 0.002055
487
730
0.000685 0.00137 0.002055
487
730
0.000342 0.000685 0.001027
973
1460
0.000137 0.000274 0.000411
2433
3650
0.000274 0.000548 0.000822
1217
1825
0.000548 0.001096 0.001644
608
913
0.000137 0.000274 0.000411
2433
3650
3650
1460
1460
2920
7300
3650
1825
7300
Figure 5
Base
Probability
Alt
Plant1
Plant2
Plant3
=RiskTriang(0.1,0.2,0.3)
=RiskTriang(0.25,0.5,0.75)
=RiskTriang(0.25,0.5,0.75)
Plant4
=RiskTriang(0.125,0.25,0.375
)
Base
NLE Days
Alt
=RiskTriang(0.05,0.1,0.15) =RiskTriang(8,10,12)
=RiskTriang(0.1,0.2,0.3)
=RiskTriang(10,10,30)
=RiskTriang(0.2,0.4,0.6)
=RiskTriang(78,97.5,117
)
=RiskTriang(0.05,0.1,0.15) =RiskTriang(0,1,90)
Base
Risk
Alt
=RiskTriang(8,10,12)
=RiskTriang(10,10,30)
=RiskTriang(78,97.5,117
)=RiskTriang(0,1,10)
=Prob*Conseq =Prob*Conseq
=Prob*Conseq =Prob*Conseq
=Prob*Conseq =Prob*Conseq
Total
= SUM
=Prob*Conseq =Prob*Conseq
=SUM
Michael E. G. Schmidt
SES-6620
Fall 2002
Page 9 of 10
Figure 6
Distribution for Total / Alt/O28
O28: X <= 29.73
5%
1
0.9
O28: X <= 58.21
95%
O28: Mean=43.68603
0.8
0.7
N28: Mean=66.63783
0.6
0.5
0.4
0.3
0.2
0.1
0
20
50
80
110
Figure 7
1
0.9
0.8
0.7
0.6
Series1
0.5
Series2
0.4
0.3
0.2
0.1
0
30
40
50
60
70
80
90
100
110
120
Michael E. G. Schmidt
SES-6620
Fall 2002
Page 10 of 10
Figure 8
Rep
1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
Variable Value DT Base
out
9033
203
out
9027
952
out
9032
967
out
8792
968
out
9048
973
out
8927
980
out
9020
1002
out
9797
1017
out
8998
1073
out
8983
1208
Mean
/yr DT Alt /yr
7
761
28
35
763
28
35
769
28
35
781
28
35
795
29
36
808
29
36
845
31
37
861
31
39
908
33
44
995
36
34
30
Figure 9
Rep
1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
Variable
out
out
out
out
out
out
out
out
out
out
Value
DT Base /yr
9036
103
8768
166
9897
166
9825
175
9040
185
9032
218
9834
960
9782
964
9834
968
9815
1232
Mean
4
6
6
6
7
8
35
35
35
45
19
Figure 10
Rep
1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
Variable
out
out
out
out
out
out
out
out
out
out
Value
DT Base /yr
8727
1273
8529
1303
8592
1304
8541
1341
8659
1391
8696
1408
8697
1443
8609
1459
8557
1471
8500
1500
Mean
46
47
47
49
51
51
52
53
53
55
51
Download