COTOR Challenge Round 3

advertisement
COTOR Challenge Round 3
Estimate $500k x $500k layer
Solution by Steve Fiete
Given data and notation:
70 claims per year for each of seven years. The process which generates claims is the
same each year except the expected value increases by an constant inflation factor each
year. Enumerate the years starting with year=0 for the first year.
If x is a random claim in year 7, then the challenge is to estimate the expected value of
min(500,000,max(0,x-500,000)) using a 95% confidence interval.
Also calculate a 95% confidence interval for the sample mean of 70 claims in year 7.
This will be denoted as xbar.
Analysis
The basic approach is to fit loss distributions to first dollar unlimited claims for each
year, and estimate the annual inflation rate. We try a variety of distributions, then use the
one with the best fit to make inferences about losses in the 500k x 500k layer. While this
is a common method, two aspects of this analysis are not standard industry practice:
1. Estimating the inflation rate simultaneously with the loss distribution parameters
for all 7 years using all 490 claims.
2. Evaluating goodness of fit of the 7 distributions (1 for each year) by rolling the
expected and actual severity distributions into a single p-p plot.
Typical methods of estimating severity trends involve estimating the average severity in
each year (or quarter, or month), smoothing those averages, then fitting a trend. Since we
are assuming a uniform trend over time we can simply incorporate the trend estimation
into the estimation procedure for the other loss distribution parameters. This way all of
the claims in all years influence the estimate of the scale parameter in the first year, the
trend parameter, and any other loss distribution parameters. The advantage is that the
maximum amount of information is used to estimate every parameter.
To evaluate confidence intervals for both the true cost of the layer, and actual mean cost
of the layer we use simulation. Most of the calculations are done in SAS; the parts done
in Excel are included with this document.
The claims trend rate is denoted as r. For simplicity, since each distribution considered
has 2 parameters we will use theta as the scale parameter and alpha as the second
parameter. Theta varies by year so that the expected value will increase by the trend rate.
For year=k theta is denoted as theta(k) The following distributions were evaluated:
Gamma:
Pdf: f(x)=(x/theta)^alpha*exp(-x/theta)/(x*(alpha))
theta(k+1)=(1+r)*theta(k)
Lognormal: x~LN(theta,alpha)
E(log(x))=theta
theta(k+1)=theta(k)+log(1+r)
Weibull:
Pdf: f(x)=alpha*(x/theta)^alpha * exp(-(x/theta)^alpha)/x
theta(k+1)=(1+r)*theta(k)
Inverse Weibull:
Pdf: f(x)=alpha*(theta/x)^alpha * exp(-(theta/x)^alpha)/x
theta(k+1)=(1+r)*theta(k)
Pareto:
Pdf: f(x)=alpha*theta^alpha * (x+theta)^(-alpha-1)
theta(k+1)=(1+r)*theta(k)
The pdf parameterizations were taken from Loss Models by Klugman, Panjer, and
Wilmot.
For each distribution theta(0), alpha, and r are estimated using maximum likelihood
estimation. The next step is to evaluate goodness of fit.
The Excel workbook “pp plots.xls” shows p-p plots for each model comparing actual
versus fitted loss distributions. The workbook also shows the parameter estimates for
each distribution, the first dollar expected value of severity in year 0, and the expected
value of loss in the 500k x 500k layer in year 7. For each model there are 7 different
distributions – 1for each year. These can be combined into a single p-p plot. The Excel
workbook “pp build.xls” shows how to combine data from 7 different gamma
distributions into a single size-of-loss table. This table can then be made into a p-p plot.
They are not shown, but the same approach was used to build the p-p plots for the other
distributions.
P-P plots were used because they allow us to examine goodness of fit across the entire
range of possible outcomes. A single number, such as log-likelihood, to describe
goodness of fit does not.
The pareto distribution has the best fit. We will assume claims within each year are
generated by a pareto distribution. The next step is to estimate a distribution of possible
parameter estimates.
Using the estimates from the original data we simulate 70 claims per year for each of the
7 years. With the simulated data we estimate the pareto and inflation parameters again.
Using these parameter estimates we calculate the expected value of the 500k x 500k
layer. Finally, we simulate 70 claims for year seven, and calculate the sample mean of the
500k x 500k layer. This process is iterated 100 times. After it is done we have a 100
expected layer values and sample layer means. The 5th and 95th percentiles of these
samples provide our confidence intervals.
The Excel workbook “simuations.xls” shows the results of 100 trials of simulating 70
claims per year in years 0 through 6 and estimating r, alpha, and theta(0). Using these
parameters the expected value of loss in the 500k x 500k layer is calculated. The 100
trials are sorted by this expected value. The 5th and 95th values determine the confidence
interval for the true mean in the layer.
With each simulation trial we also simulate 70 claims for year 7. Using this simulated
data we can calculate an actual sample mean for the layer. The entire simulation and
calculation is shown in “year 7 simulation.xls” Note that alpha and theta are different for
each of the 100 trials because in each trial they were estimated from claims simulated for
years 0-6.
Results
The 95% confidence interval for the expected value in the layer has a lower bound of
6,984, and an upper bound of 17,669. The point estimate from the original data set is
12,738. The 95% confidence interval for the actual mean of 70 claims has a lower bound
of 0, and an upper bound of 29,384.
The trend rate estimated in the pareto model is 18.5%. If we apply this annual trend rate
to each of the 490 claims to bring them to the year 7 level, then calculate the mean of
each set of 70 claims we get the following:
7,924
7,143
10,385
12,745
16,985
906
13,013
All 7 sample means lie within the 95% confidence interval for xbar. This last observation
is really just a sniff test to make sure we did not simulate our way into a clearly
unreasonable conclusion.
Download