Figure 2: The brute force method of aggregate

advertisement
Aggregate modeling
A common problem in risk analysis is the modeling of the sum of a number of independent random
variables each following the same distribution. For example, determining:





The total insurance claim amount for a policy in a year across all policy holders;
The total sales (or profit) from a number of clients;
The total time or person-hours required to complete a number of identical tasks (like laying
railway sleepers, segments of pipeline, or installing TV systems);
The total amount of a chemical ingested by an individual as a result of consumer some
product; or
The total bed-days required in a year in a hospital ward
The distribution of the number of individuals being summed is often called the frequency
distribution, and the distribution of the variables being summed is called the severity distribution. A
common error in risk modeling is to simply multiply the frequency and severity distributions. For
example, if we believed that there may be Poisson(1250) new cancer cases in a year, and that a
random cancer patient will stay Lognormal(30,20) days in hospital, one might try to model this as:
Poisson(1250)*Lognormal(30,20)
The formula is incorrect in a Monte Carlo model because it will only generate scenarios where each
patient stays the same length of time. For example, sample values of 1300 and 35 for the Poisson
and Lognormal respectively give a total of 1300*35 = 45,500 bed-days, but this scenario assigns each
patient the same 35 days stay. In reality some will stay a shorter time, and some longer.
The correct method is as follows:
1. First sample from the frequency distribution (e.g. the Poisson(1250) distribution – let’s say it
generates a value of 1300)
2. Next take 1300 independent samples from the severity distribution (the Lognormal(30,20))
and add them up. This represents one possible scenario for the total bed-days required
3. Repeat steps 1 and 2 to generate the distribution of total bed-days required.
The effect of using the incorrect model is to exaggerate the spread of the distribution of the total as
shown in Figure 1. In this example, the incorrect model would leave managers unnecessarily
concerned that the ward size is grossly insufficient to meet demand.
Correct
Incorrect
Correct
Aggregate results
1
0.9
Cumulative probability
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
50000
100000
150000
Bed-days
Figure 1: The incorrect aggregate simulation produces a far greater right tail so that, for example, it estimates that one
should budget for nearly 70,000 bed-days to be 90% confident of staying within budget, whereas the correct value is
around 50,000 bed-days.
Implementing the correct method in a spreadsheet model could be quite onerous. One needs to
write a range of cells each holding a random generating function for the severity distribution, which
can be very large. Moreover, the number of these severity distributions required depends on the
value sampled from the frequency distribution. Figure 2 shows an example:
A
1
2
3
4
5
6
7
8
9
2004
2005
2006
B
Number of patients
Total bed-days (output)
Patient #
1
2
3
4
1999
2000
C
1285
37588.81601
Length of stay
4.274241864
63.01579133
13.6852622
20.09998397
0
0
D
E
F
Formulae table
C2
=VosePoisson(1250)
C3
=VoseOutput()+SUM(C6:C2005)
C6:C2005 =IF(B6>$C$2,0,VoseLognormal(30,20))
Figure 2: The brute force method of aggregate modeling
The approach of Figure 2 is very inflexible: for example, an increase in the Poisson mean value of
1250 to 2000 would require extending the table and rewriting the summation formula in cell C3.
Moreover, this approach can be very slow to simulate. The model above takes 201 seconds to
complete 10,000 iterations.
G
ModelRisk incorporates a variety of functions to make aggregate modeling much simpler to
implement, and greatly speed up simulation time. For example, the same model can be written in
just three cells, as shown in Figure 3:
A
1
2
3
4
5
6
7
8
9
10
B
C
Number of patients
Length of tay
Total bed-days (output)
D
1286
VoseLognormal(30,20)
39063.6832
Formulae table
=VosePoisson(1250)
=VoseLognormalObject(30,20)
=VoseOutput()+VoseAggregateMC(C2,C3)
B2
B3
B4
Figure 3: Reproducing the model of Figure 3 using the VoseAggregateMC function
This version of the model takes just 27 seconds to finish 10,000 iterations: a 7.4-fold increase in
simulation speed. The VoseAggregateMC function also recognizes any probability identities that
would speed up simulation: for example, if the severity distribution followed a Gamma distribution,
the simulation takes under 3 seconds because the function knows that the sum of identical Gamma
distributions is also a Gamma distribution. That means it will take under 3 seconds to run no matter
how large the frequency values are. More importantly, the model is easily changed by simply editing
the cells C2 and C3. Note the use of the function VoseLognormalObject, which defines a variable that
will be used many times in the model – Object functions are a critical advantage that ModelRisk
offers over any of its competitors, as will be demonstrated below.
An important aspect of any modelling software is to provide the user with some flexibility in how
they wish to build their models. ModelRisk allows the user to build the above model in any number
of different ways, some of which are shown in Figure 4:
A
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
B
Set-up 1
38153.84
C
D
E
F
G
H
I
Set-up 2
Set-up 3
Set-up 4
VosePoisson(1250)
VoseLognormal(30,20)
37724.76
VoseLognormal(30,20)
36000.83
Mean # patients
Mean days in ward
Stdev days in ward
38399.07
Formulae table
Set-up 1: Single formula
B4
=VoseOutput()+VoseAggregateMC(VosePoisson(1250),VoseLognormalObject(30,20))
Set-up 2: frequency and severity distributions defined as Objects for greater visibility
D4
=VosePoissonObject(1250)
D5
=VoseLognormalObject(30,20)
D6
=VoseOutput()+VoseAggregateMC(VoseSimulate(D4),D5)
Set-up 3: severity distribution only separated out
F4
=VoseLognormalObject(30,20)
F5
=VoseOutput()+VoseAggregateMC(VosePoisson(1250),F4)
Set-up 4: parameter values separated out
H7
=VoseOutput()+VoseAggregateMC(VosePoisson(J4),VoseLognormalObject(J5,J6))
Set-up 5: both distributions separated out and severity distribution fitted to data
L4
=VosePoissonObject(1250)
L5
=VoseLognormalFitObject(O3:O228)
L6
=VoseOutput()+VoseAggregateMC(VoseSimulate(L4),L5)
J
K
L
Set-up 5
1250
30
20
VosePoisson(1250)
VoseLognormal(28.758700,18.094306)
35357.26
M
Figure 4: Examples of different ways in which the aggregate model can be built with ModelRisk
It is also important to provide checks so the user can verify exactly what it is they are doing, and
where the problems lie in any mistakes they have made. For example, selecting on any one of the
cells in Figure 3 or 4 containing the AggregateMC function, and then clicking the ModelRisk ‘View
function’ icon will display the window shown in Figure 5:
Figure 5: visualizing the AggregateMC function in ModelRisk. The frequency and severity distributions are plotted above,
and a histogram of (in this case 1000) Monte Carlo generated samples of the aggregate distribution shown below. Statistics
for the sample (labeled ‘MC’) and the theoretical (labeled ‘exact’) moments (mean, variance, etc), which can be derived
from the properties of the frequency and severity distributions, are compared in the table as a numerical check that the
function is working well.
Other aggregate functions available in ModelRisk
Aggregate modeling is such an important component of risk analysis that ModelRisk provides a
whole range of aggregate functions:
VoseAggregateMC – as used in the examples above, this uses pure Monte Carlo sampling to sum
severity variables
VoseAggregateMultiMC – similar to VoseAggregateMC except the function will aggregate multiple
{frequency,severity} pairs, and allow correlation between frequency distributions. One might use
this, for example, to look at the bed-days needed across all wards of a hospital.
VoseExpression – this function allows the user to specify a severity variable of essentially any
required complexity. For example, one could describe a cost-sharing above a certain level of severity
or make the severity distribution different for men and women, or young and old.
VoseAggregateDeduct – which allows one to model the aggregation of insurance claims where the
policy has a deductible and/or a limit on the amount paid out in a single claim
Insurance aggregate functions – specialized functions for the insurance industry that utilize
advanced methods of aggregate claim calculations like fast Fourier transform, de Pril’s and Panjer
methods as well as multivariate aggregation methods. These are available in a separate Insurance
and Finance module.
ModelRisk competitors
Some competing Monte Carlo Excel add-ins have attempted to copy the simplest aggregate
functions in ModelRisk: namely VoseAggregateMC and VoseAggregateDeduct. Unfortunately, their
functions do not work correctly and contravene Excel’s convention on how functions should interact.
The reason that they don’t work correctly is that the competing products have not incorporated
Objects. Their tools only produce functions that sample from random variables, which means that
they have no means of differentiating between sampling from a distribution and defining a
distribution that is to be used in some algorithm (the Object concept).
The danger of attempting to get around the need for Objects is easily illustrated with a few
examples:
@RISK from Palisade Corporation
With @RISK version 5.0, Palisade Corporation introduced the RiskCompound function, which takes
four parameters:




Frequency distribution
Severity distribution
Deductible (optional)
Limit (optional)
The last two parameters would be used for insurance modeling, and reflect the analysis being
evaluated by ModelRisk’s VoseAggregateDeduct function. Thus, using this function to solve the
hospital problem above, one would write:
=RiskCompound(RiskPoisson(1250),RiskLognorm(30,20))
In a normal model, the RiskLognorm function is used to sample random values from a Lognormal
dist. However, within the RiskCompound function it needs to be interpreted differently: as an
Object, rather than a function generating values. The Excel convention for user-defined functions is
that the parameters within the function are evaluated first, but if that were done here we would get,
for example (remembering the common error example at the beginning of this paper):
=RiskCompound(1300,35)
which would logically then give the value 45,500. Thus, @RISK has to suppress the evaluation of the
severity variable, contravening Excel’s rules, with disastrous and unpredictable results, as shown in
the following examples (where RiskNormal in these examples could be replaced by any other @RISK
distribution function):








=RiskCompound(n, A1): where A1 contains “=-RiskNormal(,)” the minus sign is ignored
=RiskCompound(n, -RiskNormal(,)): “#VALUE!” is returned
=RiskCompound(n,k+RiskNormal(,)): “#VALUE!” is returned, no matter whether k is a cell
reference, an @RISK distribution or a fixed vale. However, place ‘=k+RiskNormal(,)’ in a
cell (eg A1) and =RiskCompound(n,A1) generates values
=RiskCompound(n,k): where k is a constant, now k is no longer ignored
=RiskCompound(n,k): where k is a RiskCompound function, the error “#NUM! is
generated
=RiskCompound(n,RiskNormal,)^2): where k is a RiskCompound function, the error
“#NUM! is generated
If n > 1,000,000 the RiskCompound function returns #VALUE!
If n is not an integer the RiskCompound(n, …) function rounds down to the nearest integer
value (so, for example, 1.999 is interpreted as 1) which systematically underestimates the
aggregate value with no warning. ModelRisk’s VoseAggregateMC function, by contrast,
returns the error message: “Error: N should be an integer value or discrete distribution
object”
None of these errors are possible with ModelRisk. Moreover, ModelRisk can handle the
combinations you are interested in. For example, the equivalent of:
=RiskCompound(n,RiskNormal,)^2)
(if it worked) in ModelRisk is:
=VoseAggregateMC(n,VoseExpression(“#1^2”,VoseNormalObject(,))
where the “#1^2” string tells the function what it must do with this variable and it does work.
You can also build more complex expressions. For example, there might be a 60% probability that
someone entering a shop spends Lognormal(20,5) dollars, and a 40% probability they spend nothing.
If Poisson(500) people enter the shop, the total revenue is given by:
=VoseAggregateMC(A1,VoseExpression("IF(#1=1,#2,0)",A2,A3))
with
A1:=VosePoisson(500)
A2: = VoseBernoulliObject(0.6)
A3: =VoseLognormalObject(20,5)
Crystal Ball from Oracle
Crystal Ball does not offer any aggregate functions.
FinRisk from Cranes Software
FinRisk includes a RandSum function to aggregate random variables, but the function requires that
one know and follow very specific rules. Unfortunately these rules are not apparent and when the
rules are not followed, errors are returned with no explanations, making it very hard to understand
why a certain combination is not working.
The RandSum function takes two parameters:


Severity distribution
Frequency distribution
A first rule that one should know about, and which is not obvious from the function’s interface or
from the help file, is that the Frequency distribution can be entered in the formula directly, but NOT
the Severity distribution. This means that the function to solve the hospital problem mentioned
earlier could not be entered as a formula without linking to an external cell:
=FinRandSum(A1,FinPoisson(1250)),
where A1 refers to the Severity distribution. Other examples which show the very specific rules one
has to follow include:




=FinRandSum(A1,n) where A1 contains “=-FinNormal(µ,σ)”: “#VALUE!” is returned
=FinRandSum(A1,n) where A1 contains “=k + FinNormal(µ,σ)”: “#VALUE!” is returned, no
matter whether k is a cell reference, another distribution or a fixed value
In the same vein that it is not allowed to enter the Severity distribution directly in the
RandSum function, it is also not allowed to enter a fixed value directly. Even when linking to
a fixed value for the Severity distribution, the function returns “#VALUE!”
=FinRandSum(k,n) where k is a FinRandSum function: “#VALUE!” is returned
Just like with the @RISK errors, none of the errors above are possible with ModelRisk. In addition to
this, ModelRisk does a far better job than FinRisk when it comes to handling large numbers for the
Frequency distribution. When setting n to 1,000,000 it takes FinRisk about 30 seconds to generate a
single random number from the RandSum function where the Severity distribution is just a
Normal(0,1). ModelRisk’s VoseAggregateMC(1000000,VoseNormal(0,1)) generates random numbers
instantaneously. As a final note, when n is not an integer, the RandSum(… ,n) function returns
“#FinError”, which is fine, but it doesn’t tell you why this is an error. ModelRisk’s VoseAggregateMC
function returns the error message: “Error: N should be an integer value or discrete distribution
object”.
RiskSolver from Frontline Systems, inc
RiskSolver does not offer any aggregate functions.
Download