Aggregate modeling A common problem in risk analysis is the modeling of the sum of a number of independent random variables each following the same distribution. For example, determining: The total insurance claim amount for a policy in a year across all policy holders; The total sales (or profit) from a number of clients; The total time or person-hours required to complete a number of identical tasks (like laying railway sleepers, segments of pipeline, or installing TV systems); The total amount of a chemical ingested by an individual as a result of consumer some product; or The total bed-days required in a year in a hospital ward The distribution of the number of individuals being summed is often called the frequency distribution, and the distribution of the variables being summed is called the severity distribution. A common error in risk modeling is to simply multiply the frequency and severity distributions. For example, if we believed that there may be Poisson(1250) new cancer cases in a year, and that a random cancer patient will stay Lognormal(30,20) days in hospital, one might try to model this as: Poisson(1250)*Lognormal(30,20) The formula is incorrect in a Monte Carlo model because it will only generate scenarios where each patient stays the same length of time. For example, sample values of 1300 and 35 for the Poisson and Lognormal respectively give a total of 1300*35 = 45,500 bed-days, but this scenario assigns each patient the same 35 days stay. In reality some will stay a shorter time, and some longer. The correct method is as follows: 1. First sample from the frequency distribution (e.g. the Poisson(1250) distribution – let’s say it generates a value of 1300) 2. Next take 1300 independent samples from the severity distribution (the Lognormal(30,20)) and add them up. This represents one possible scenario for the total bed-days required 3. Repeat steps 1 and 2 to generate the distribution of total bed-days required. The effect of using the incorrect model is to exaggerate the spread of the distribution of the total as shown in Figure 1. In this example, the incorrect model would leave managers unnecessarily concerned that the ward size is grossly insufficient to meet demand. Correct Incorrect Correct Aggregate results 1 0.9 Cumulative probability 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 50000 100000 150000 Bed-days Figure 1: The incorrect aggregate simulation produces a far greater right tail so that, for example, it estimates that one should budget for nearly 70,000 bed-days to be 90% confident of staying within budget, whereas the correct value is around 50,000 bed-days. Implementing the correct method in a spreadsheet model could be quite onerous. One needs to write a range of cells each holding a random generating function for the severity distribution, which can be very large. Moreover, the number of these severity distributions required depends on the value sampled from the frequency distribution. Figure 2 shows an example: A 1 2 3 4 5 6 7 8 9 2004 2005 2006 B Number of patients Total bed-days (output) Patient # 1 2 3 4 1999 2000 C 1285 37588.81601 Length of stay 4.274241864 63.01579133 13.6852622 20.09998397 0 0 D E F Formulae table C2 =VosePoisson(1250) C3 =VoseOutput()+SUM(C6:C2005) C6:C2005 =IF(B6>$C$2,0,VoseLognormal(30,20)) Figure 2: The brute force method of aggregate modeling The approach of Figure 2 is very inflexible: for example, an increase in the Poisson mean value of 1250 to 2000 would require extending the table and rewriting the summation formula in cell C3. Moreover, this approach can be very slow to simulate. The model above takes 201 seconds to complete 10,000 iterations. G ModelRisk incorporates a variety of functions to make aggregate modeling much simpler to implement, and greatly speed up simulation time. For example, the same model can be written in just three cells, as shown in Figure 3: A 1 2 3 4 5 6 7 8 9 10 B C Number of patients Length of tay Total bed-days (output) D 1286 VoseLognormal(30,20) 39063.6832 Formulae table =VosePoisson(1250) =VoseLognormalObject(30,20) =VoseOutput()+VoseAggregateMC(C2,C3) B2 B3 B4 Figure 3: Reproducing the model of Figure 3 using the VoseAggregateMC function This version of the model takes just 27 seconds to finish 10,000 iterations: a 7.4-fold increase in simulation speed. The VoseAggregateMC function also recognizes any probability identities that would speed up simulation: for example, if the severity distribution followed a Gamma distribution, the simulation takes under 3 seconds because the function knows that the sum of identical Gamma distributions is also a Gamma distribution. That means it will take under 3 seconds to run no matter how large the frequency values are. More importantly, the model is easily changed by simply editing the cells C2 and C3. Note the use of the function VoseLognormalObject, which defines a variable that will be used many times in the model – Object functions are a critical advantage that ModelRisk offers over any of its competitors, as will be demonstrated below. An important aspect of any modelling software is to provide the user with some flexibility in how they wish to build their models. ModelRisk allows the user to build the above model in any number of different ways, some of which are shown in Figure 4: A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 B Set-up 1 38153.84 C D E F G H I Set-up 2 Set-up 3 Set-up 4 VosePoisson(1250) VoseLognormal(30,20) 37724.76 VoseLognormal(30,20) 36000.83 Mean # patients Mean days in ward Stdev days in ward 38399.07 Formulae table Set-up 1: Single formula B4 =VoseOutput()+VoseAggregateMC(VosePoisson(1250),VoseLognormalObject(30,20)) Set-up 2: frequency and severity distributions defined as Objects for greater visibility D4 =VosePoissonObject(1250) D5 =VoseLognormalObject(30,20) D6 =VoseOutput()+VoseAggregateMC(VoseSimulate(D4),D5) Set-up 3: severity distribution only separated out F4 =VoseLognormalObject(30,20) F5 =VoseOutput()+VoseAggregateMC(VosePoisson(1250),F4) Set-up 4: parameter values separated out H7 =VoseOutput()+VoseAggregateMC(VosePoisson(J4),VoseLognormalObject(J5,J6)) Set-up 5: both distributions separated out and severity distribution fitted to data L4 =VosePoissonObject(1250) L5 =VoseLognormalFitObject(O3:O228) L6 =VoseOutput()+VoseAggregateMC(VoseSimulate(L4),L5) J K L Set-up 5 1250 30 20 VosePoisson(1250) VoseLognormal(28.758700,18.094306) 35357.26 M Figure 4: Examples of different ways in which the aggregate model can be built with ModelRisk It is also important to provide checks so the user can verify exactly what it is they are doing, and where the problems lie in any mistakes they have made. For example, selecting on any one of the cells in Figure 3 or 4 containing the AggregateMC function, and then clicking the ModelRisk ‘View function’ icon will display the window shown in Figure 5: Figure 5: visualizing the AggregateMC function in ModelRisk. The frequency and severity distributions are plotted above, and a histogram of (in this case 1000) Monte Carlo generated samples of the aggregate distribution shown below. Statistics for the sample (labeled ‘MC’) and the theoretical (labeled ‘exact’) moments (mean, variance, etc), which can be derived from the properties of the frequency and severity distributions, are compared in the table as a numerical check that the function is working well. Other aggregate functions available in ModelRisk Aggregate modeling is such an important component of risk analysis that ModelRisk provides a whole range of aggregate functions: VoseAggregateMC – as used in the examples above, this uses pure Monte Carlo sampling to sum severity variables VoseAggregateMultiMC – similar to VoseAggregateMC except the function will aggregate multiple {frequency,severity} pairs, and allow correlation between frequency distributions. One might use this, for example, to look at the bed-days needed across all wards of a hospital. VoseExpression – this function allows the user to specify a severity variable of essentially any required complexity. For example, one could describe a cost-sharing above a certain level of severity or make the severity distribution different for men and women, or young and old. VoseAggregateDeduct – which allows one to model the aggregation of insurance claims where the policy has a deductible and/or a limit on the amount paid out in a single claim Insurance aggregate functions – specialized functions for the insurance industry that utilize advanced methods of aggregate claim calculations like fast Fourier transform, de Pril’s and Panjer methods as well as multivariate aggregation methods. These are available in a separate Insurance and Finance module. ModelRisk competitors Some competing Monte Carlo Excel add-ins have attempted to copy the simplest aggregate functions in ModelRisk: namely VoseAggregateMC and VoseAggregateDeduct. Unfortunately, their functions do not work correctly and contravene Excel’s convention on how functions should interact. The reason that they don’t work correctly is that the competing products have not incorporated Objects. Their tools only produce functions that sample from random variables, which means that they have no means of differentiating between sampling from a distribution and defining a distribution that is to be used in some algorithm (the Object concept). The danger of attempting to get around the need for Objects is easily illustrated with a few examples: @RISK from Palisade Corporation With @RISK version 5.0, Palisade Corporation introduced the RiskCompound function, which takes four parameters: Frequency distribution Severity distribution Deductible (optional) Limit (optional) The last two parameters would be used for insurance modeling, and reflect the analysis being evaluated by ModelRisk’s VoseAggregateDeduct function. Thus, using this function to solve the hospital problem above, one would write: =RiskCompound(RiskPoisson(1250),RiskLognorm(30,20)) In a normal model, the RiskLognorm function is used to sample random values from a Lognormal dist. However, within the RiskCompound function it needs to be interpreted differently: as an Object, rather than a function generating values. The Excel convention for user-defined functions is that the parameters within the function are evaluated first, but if that were done here we would get, for example (remembering the common error example at the beginning of this paper): =RiskCompound(1300,35) which would logically then give the value 45,500. Thus, @RISK has to suppress the evaluation of the severity variable, contravening Excel’s rules, with disastrous and unpredictable results, as shown in the following examples (where RiskNormal in these examples could be replaced by any other @RISK distribution function): =RiskCompound(n, A1): where A1 contains “=-RiskNormal(,)” the minus sign is ignored =RiskCompound(n, -RiskNormal(,)): “#VALUE!” is returned =RiskCompound(n,k+RiskNormal(,)): “#VALUE!” is returned, no matter whether k is a cell reference, an @RISK distribution or a fixed vale. However, place ‘=k+RiskNormal(,)’ in a cell (eg A1) and =RiskCompound(n,A1) generates values =RiskCompound(n,k): where k is a constant, now k is no longer ignored =RiskCompound(n,k): where k is a RiskCompound function, the error “#NUM! is generated =RiskCompound(n,RiskNormal,)^2): where k is a RiskCompound function, the error “#NUM! is generated If n > 1,000,000 the RiskCompound function returns #VALUE! If n is not an integer the RiskCompound(n, …) function rounds down to the nearest integer value (so, for example, 1.999 is interpreted as 1) which systematically underestimates the aggregate value with no warning. ModelRisk’s VoseAggregateMC function, by contrast, returns the error message: “Error: N should be an integer value or discrete distribution object” None of these errors are possible with ModelRisk. Moreover, ModelRisk can handle the combinations you are interested in. For example, the equivalent of: =RiskCompound(n,RiskNormal,)^2) (if it worked) in ModelRisk is: =VoseAggregateMC(n,VoseExpression(“#1^2”,VoseNormalObject(,)) where the “#1^2” string tells the function what it must do with this variable and it does work. You can also build more complex expressions. For example, there might be a 60% probability that someone entering a shop spends Lognormal(20,5) dollars, and a 40% probability they spend nothing. If Poisson(500) people enter the shop, the total revenue is given by: =VoseAggregateMC(A1,VoseExpression("IF(#1=1,#2,0)",A2,A3)) with A1:=VosePoisson(500) A2: = VoseBernoulliObject(0.6) A3: =VoseLognormalObject(20,5) Crystal Ball from Oracle Crystal Ball does not offer any aggregate functions. FinRisk from Cranes Software FinRisk includes a RandSum function to aggregate random variables, but the function requires that one know and follow very specific rules. Unfortunately these rules are not apparent and when the rules are not followed, errors are returned with no explanations, making it very hard to understand why a certain combination is not working. The RandSum function takes two parameters: Severity distribution Frequency distribution A first rule that one should know about, and which is not obvious from the function’s interface or from the help file, is that the Frequency distribution can be entered in the formula directly, but NOT the Severity distribution. This means that the function to solve the hospital problem mentioned earlier could not be entered as a formula without linking to an external cell: =FinRandSum(A1,FinPoisson(1250)), where A1 refers to the Severity distribution. Other examples which show the very specific rules one has to follow include: =FinRandSum(A1,n) where A1 contains “=-FinNormal(µ,σ)”: “#VALUE!” is returned =FinRandSum(A1,n) where A1 contains “=k + FinNormal(µ,σ)”: “#VALUE!” is returned, no matter whether k is a cell reference, another distribution or a fixed value In the same vein that it is not allowed to enter the Severity distribution directly in the RandSum function, it is also not allowed to enter a fixed value directly. Even when linking to a fixed value for the Severity distribution, the function returns “#VALUE!” =FinRandSum(k,n) where k is a FinRandSum function: “#VALUE!” is returned Just like with the @RISK errors, none of the errors above are possible with ModelRisk. In addition to this, ModelRisk does a far better job than FinRisk when it comes to handling large numbers for the Frequency distribution. When setting n to 1,000,000 it takes FinRisk about 30 seconds to generate a single random number from the RandSum function where the Severity distribution is just a Normal(0,1). ModelRisk’s VoseAggregateMC(1000000,VoseNormal(0,1)) generates random numbers instantaneously. As a final note, when n is not an integer, the RandSum(… ,n) function returns “#FinError”, which is fine, but it doesn’t tell you why this is an error. ModelRisk’s VoseAggregateMC function returns the error message: “Error: N should be an integer value or discrete distribution object”. RiskSolver from Frontline Systems, inc RiskSolver does not offer any aggregate functions.