Estimating the Predictive Distribution for Loss Reserve Models Glenn Meyers

advertisement
Estimating the Predictive Distribution
for
Loss Reserve Models
Glenn Meyers
ISO Innovative Analytics
CAS Annual Meeting
November 14, 2007
S&P Report, November 2003
Insurance Actuaries – A Crisis in Credibility
“Actuaries are signing off on reserves
that turn out to be wildly inaccurate.”
Background to Methodology - 1
• Zehnwirth/Mack
– Loss reserve estimates via regression
– y = a∙x + e
• GLM – E[Y] = f(a∙x)
– Allows choice of f and the distribution of Y
– Choices restricted to speed calculations
• Clark – Direct maximum likelihood
– Assumes Y has an Overdispersed Poisson
distribution
Background to Methodology - 2
• Heckman/Meyers
– Used Fourier transforms to calculate
aggregate loss distributions in terms of
frequency and severity distributions.
• Hayne
– Applied Heckman/Meyers to calculate
distributions of ultimate outcomes, given
estimate of mean losses
High Level View of Paper
• Combine 1-2 above
– Use aggregate loss distributions defined in
terms of Fourier transforms to (1) estimate
losses and (2) get distributions of ultimate
outcomes.
• Uses “other information” from data of ISO
and from other insurers.
– Implemented with Bayes theorem
Objectives of Paper
• Develop a methodology for predicting the
distribution of outcomes for a loss reserve model.
• The methodology will draw on the combined
experience of other “similar” insurers.
– Use Bayes’ Theorem to identify “similar” insurers.
• Illustrate the methodology on Schedule P data
• Test the predictions of the methodology on
several insurers with data from later Schedule P
reports.
• Compare results with reported reserves.
A Quick Description of the Methodology
• Expected loss is predicted by chain
ladder/Cape Cod type formula
• The distribution of the actual loss around
the expected loss is given by a collective
risk (i.e. frequency/severity) model.
A Quick Description of the Methodology
• The first step in the methodology is to get the
maximum likelihood estimates of the model
parameters for several large insurers.
• For an insurer’s data
– Find the likelihood (probability of the data) given
the parameters of each model in the first step.
– Use Bayes’ Theorem to find the posterior
probability of each model in the first step given
the insurer’s data.
A Quick Description of the Methodology
• The predictive loss model is a mixture of
each of the models from the first step,
weighted by its posterior probability.
• From the predictive loss model, one can
calculate ranges or statistics of interest
such as the standard deviation or various
percentiles of the predicted outcomes.
The Data
• Commercial Auto Paid Losses from 1995
Schedule P (from AM Best)
– Long enough tail to be interesting, yet we
expect minimal development after 10 years.
• Selected 250 Insurance Groups
– Exposure in all 10 years
– Believable payment patterns
– Set negative incremental losses equal to zero.
16 insurer groups
account for one half of
the premium volume
Look at Incremental Development Factors
• Accident year 1986
• Proportion of loss paid in the “Lag”
development year
• Divided the 250 Insurers into four industry
segments, each accounting for about 1/4
of the total premium.
• Plot the payment paths
Incremental Development Factors - 1986
Incremental development
factors appear to be
relatively stable for the 40
insurers that represent
about 3/4 of the premium.
They are highly unstable
for the 210 insurers that
represent about 1/4 of the
premium.
The variability appears to
increase as size
decreases
Do Incremental Development
Factors Differ by Size of Insurer?
• Form loss triangles as the sum of the loss
triangles for all insurers in each of the four
industry segments defined above.
• Plot the payment paths
Segment 1
Segment 3
Segment 2
Segment 4
There is no consistent
pattern in aggregate
loss payment factors
for the four industry
segments.
Expected Loss Model
E Paid LossAY ,Lag  Premium AY  ELR  Dev Lag
• Paid Loss is the incremental paid loss in the AY and Lag
• ELR is the Expected Loss Ratio
• ELR and DevLag are unknown parameters
– Can be estimated by maximum likelihood
– Can be assigned posterior probabilities for Bayesian analysis
• Similar to “Cape Cod” method in that the expected loss
ratio is estimated rather than determined externally.
Distribution of Actual Loss
around the Expected Loss
• Compound Negative Binomial Distribution (CNB)
– Conditional on Expected Loss – CNB(x | E[Paid Loss])
– Claim count is negative binomial
– Claim severity distribution determined externally
• The claim severity distributions were derived from data
reported to ISO. Policy Limit = $1,000,000
– Vary by settlement lag. Later lags are more severe.
• Claim Count has a negative binomial distribution with
l = E[Paid Loss]/E[Claim Severity] and c = .01
• See Meyers - 2007 “The Common Shock Model for
Correlated Insurance Losses” for background on this
model.
Claim Severity Distributions
Lags 5-10
Lag 4
Lag 3
Lag 2
Lag 1

Calculating CNB x AY ,Lag | E Paid Loss AY ,Lag 
qAY ,Lag



1 
  X  1  c  lAY ,Lag   ZLag pLag   1

1/ c




Where
ZLag  claim severity random variable
pLag  discretized (interval length h) ZLag vector
 ZLag pLag   Fast Fourier Transform (FFT) of pLag
lAY ,Lag  E Paid LossAY ,Lag  E ZLag 
 X1 
  Inverse FFT


CNB x AY ,Lag | E Paid Loss AY ,Lag  
Round  x AY ,Lag h  1
th
component of qAY ,Lag
Likelihood Function for a Given
Insurer’s Losses – x AY ,Lag 


Likelihood  x AY ,Lag 


10 11 AY
  CNB  x
AY 1 Lag 1
AY ,Lag
| E Paid Loss AY ,Lag 

where
E Paid LossAY ,Lag   PremiumAY  ELR  Dev Lag
Maximum Likelihood Estimates
• Estimate ELR and DevLag simultaneously by
maximum likelihood
• Constraints on DevLag
– Dev1 ≤ Dev2
– Devi ≥ Devi+1 for i = 2,3,…,7
– Dev8 = Dev9 = Dev10
• Use R’s optim function to maximize likelihood
– Read appendix of paper before you try this
Maximum Likelihood Estimates of
Incremental Development Factors
Loss development factors
reflect the constraints on
the MLE’s described in
prior slide
Contrast this with the
observed 1986 loss
development factors on
the next slide
Incremental Development Factors - 1986
(Repeat of Earlier Slide)
Loss payment factors
appear to be relatively
stable for the 40 insurers
that represent about 3/4
of the premium.
They are highly unstable
for the 210 insurers that
represent about 1/4 of the
premium.
The variability appears to
increase as size
decreases
Maximum Likelihood Estimates of
Expected Loss Ratios
Estimates of the ELRs are
more volatile for the
smaller insurers.
Testing the Compound Negative
Binomial (CNB) Assumption
• Calculate the percentiles of each observation
given E[Paid Loss].
– 55 observations for each insurer
• If CNB is right, the calculated percentiles should
be uniformly distributed.
• Test with PP Plot
– Sort calculated percentiles in increasing order
– Vector (1:n)/(n+1) where n is the number of percentiles
– The plot of the above two vectors against each other
should be on the diagonal line.
Interpreting PP Plots
Take 1000 lognormally
distributed random variables
with m = 0 and s = 2 as “data”
If a whole bunch of predicted
percentiles are at the ends, the
predicted tail is too light.
If a whole bunch of predicted
percentiles are in the middle,
the predicted tail is too heavy.
If in general the predicted
percentiles are low, the
predicted mean is too high
Testing the CNB Assumptions
Insurer Ranks 1-40 (Large Insurers)
This sample has 55×40 or
2200 observations.
According to the
Kolomogorov-Smirnov test,
D statistic for a sample of
2200 uniform random
numbers should be within
± 0.026 of the 45º line 95%
of the time.
Actual D statistic = 0.042.
As the plot shows, the
predicted percentiles are
slightly outside the 95%
band. We are close.
Testing the CNB Assumptions
Insurer Ranks 1-40 (Large Insurers)
Breaking down the prior
plot by settlement lag
shows that there could
be some improvement
by settlement lag.
But in general, not bad!
pp plots by
settlement lag
Testing the CNB Assumptions
Insurer Ranks 41-250 (Smaller Insurers)
This is
bad!
pp plots by
settlement lag
Using Bayes’ Theorem
• Let W = {ELR, DevLag, Lag = 1,2,…,10} be
a set of models for the data.
– A model may consist of different “models” or
of different parameters for the same “model.”
• For each model in W, calculate the
likelihood of the data being analyzed.
 Pr data | model
Using Bayes’ Theorem
• Then using Bayes’ Theorem, calculate the
posterior probability of each parameter set
given the data.
Posterior model | data 

Pr data | model  Prior model
Selecting Prior Probabilities
• For Lag, select the payment paths from the
maximum likelihood estimates of the 40 largest
insurers, each with equal probability.
• For ELR, first look at the distribution of maximum
likelihood estimates of the ELR from the 40
largest insurers and visually “smooth out” the
distribution. See the slide on ELR prior below.
• Note that Lag and ELR are assumed to be
independent.
Prior Distribution of
Loss Payment Paths
Prior loss payment paths
come from the loss
development paths of the
insurers ranked 1-40, with
equal probability
Posterior loss payment
path is a mixture of prior
loss development paths.
Prior Distribution of
Expected Loss Ratios
The prior distribution
of expected loss ratios
was chosen by visual
inspection.
Predicting Future Loss Payments
Using Bayes’ Theorem
• For each model, estimate the statistic of choice,
S, for future loss payments.
• Examples of S
–
–
–
–
Expected value of future loss payments
Second moment of future loss payments
The probability density of a future loss payment of x,
The cumulative probability, or percentile, of a future
loss payment of x.
• These examples can apply to single (AY,Lag)
cells, of any combination of cells such as a given
Lag or accident year.
Predicting Future Loss Payments
Using Bayes’ Theorem for
Sums over Sets of {AY,Lag}
• If we assume losses are independent by AY and Lag



q     1  c  lAY ,Lag   ZLag pLag   1
 AY Lag
1
X

1/ c
• Actually use the negative multinomial distribution
– Assumes correlation of frequency between lags in the
same accident year


 lAY ,Lag   ZLag  pLag  1  
q     E   e
 
 AY
 Lag


1
X



Predicting Future Loss Payments
Using Bayes’ Theorem
• Calculate the Statistic S for each model.
• Then the posterior estimate of S is the
model estimate of S weighted by the
posterior probability of each model
Posterior Estimate of S

n
 S | model   Posterior model
i 1
i
i
| data 
Sample Calculations
for Selected Insurers
• Coefficient of Variation of predictive
distribution of unpaid losses.
• Plot the probability density of the predictive
distribution of unpaid losses.
Predictive Distribution
Insurer Rank 7
Predictive Mean = $401,951 K
CV of Total Reserve
= 6.9%
Predictive Distribution
Insurer Rank 97
Predictive Mean = $40,277 K
CV of Total Reserve
= 12.6%
CV of Unpaid Losses
Validating the Model on Fresh Data
• Examined data from 2001 Annual Statements
– Both 1995 and 2001 statements contained losses
paid for accident years 1992-1995.
– Often statements did not agree in overlapping years
because of changes in corporate structure. We got
agreement in earned premium for 109 of the 250
insurers.
• Calculated the predicted percentiles for the
amount paid 1997-2001
• Evaluate predictions with pp plots.
PP Plots on Validation Data
KS 95%
critical values = ±13.03%
Feedback
• If you have paid data, you must also have the
posted reserves. How do your predictions
match up with reported reserves?
– In other words, is S&P right?
• Your results are conditional on the data
reported in Schedule P. Shouldn’t an actuary
with access to detailed company data (e.g.
case reserves) be able to get more accurate
estimates?
Response – Expand the Original
Scope of the Paper
• Could persuade more people to look at the
technical details.
• Warning – Do not over-generalize the
results beyond commercial auto in 19952001 timeframe.
Predictive and Reported Reserves
• For the validation sample, the predictive mean (in
aggregate) is closer to the 2001 retrospective reserve.
• Possible conservatism in reserves. OK?
• “%” means % reported over the predictive mean.
• Retrospective = reported less paid prior to end of 1995.
Predictive Percentiles of Reported Reserves
• Conservatism is not evenly
spread out.
• Conservatism appears to be
independent of insurer size
• Except for the evidence of
conservatism, the reserves
are spread out in a way
similar to losses.
• Were the reserves equal to
ultimate losses?
Reported Reserves More Accurate?
•
Divide the validation sample in to two groups and
look at subsequent development.
1. Reported Reserve < Predictive Mean
2. Reported Reserve > Predictive Mean
•
Expected result if Reported Reserve is accurate.
–
•
Reported Reserve = Retrospective Reserve for each
group
Expected result if Predictive Mean is accurate?
–
–
Predictive Mean  Retrospective Reserve for each
group
There are still some outstanding losses in the
retrospective reserve.
Subsequent Reserve Changes
Group 1
Group 2
• Group 1
• 50-50 up/down
• Ups are bigger
• Group 2
• More downs than
ups
• Results are
independent of
insurer size
Subsequent Reserve Changes
Reported Reserve @ 1995
< Predictive Mean (000)
> Predictive Mean (000)
66
43
Total Predictive Mean
926,134
872,660
1995 Reserve @ 1995
803,175
1,173,124
1995 Reserve @ 2001
856,393
985,711
Number of Insurers
•
The CNB formula identified two groups where:
–
–
•
Incomplete agreement at Group level
–
•
Group 1 tends to under-reserve
Group 2 tends to over-reserve
Some in each group get it right
Discussion??
Main Points of Paper
• How do we evaluate stochastic loss reserve
formula?
– Test predictions of future loss payments
– Test on several insurers
– Main Focus
• Are there any formulas that can pass these tests?
– Bayesian CNB does pretty good on CA Schedule P data.
– Uses information from many insurers
– Are there other formulas? This paper sets a bar for others
to raise.
Subsequent Developments
•
•
•
•
Paper completed in April 2006
Additional critique
Describe recent developments
Describe ongoing research
PP Plots on Validation Data
Clive Keatinge’s Observation
• Does the leveling of
plots at the end
indicate that the
predicted tails are too
light?
• The plot is still within
the KS bounds and
thus is not statistically
significant.
• The leveling looks
rather systematic.
Alternative to the KS
Anderson-Darling Test

  


A  n  n  1  Fn  y j   ln 1  F *  y j   ln 1  F *  y j 1 
k
2
j 0

2
  


 n  Fn  y j   ln F *  y j 1   ln F *  y j 
k
j 0
2


• AD is more sensitive to tails.
• Critical values are 1.933, 2.492, and 3.857 for 10, 5 and
1% levels respectively.
• Value for validation sample is 2.966
• Not outrageously bad, but Clive has a point.
• Explanation – Did not reflect all sources of uncertainty??
Is Bayesian Methodology Necessary?
• “Thinking Outside the Triangle”
– Paper in June 2007 ASTIN Colloquium
• Works with simulated data on a similar
model
• Compares Bayesian with maximum
likelihood predictive distributions
0.6
0.4
0.2
0.0
Predicted Probability
• PP plot reveals
the S-shape that
characterizes
overfitting.
• The tails are too
light
0.8
1.0
Maximum Likelihood Fitting Methodology
PP Plots for Combined Fits
0.0
0.2
0.4
0.6
Uniform Probability
0.8
1.0
0.6
0.4
0.2
0.0
Predicted Probability
Nailed the
Tails
0.8
1.0
Bayesian Fitting Methodology
PP Plots for Combined Fits
0.0
0.2
0.4
0.6
Uniform Probability
0.8
1.0
IN THIS EXAMPLE
• Maximum Likelihood method understates
the true variability
• I call this “overfitting” i.e. the model fits the
data rather than the population
• Nine parameters fit to 55 points
• SPECULATION – Overfitting will occur in
all maximum likelihood methods and in
moment based methods
– i.e. GLM and Mack
Expository Paper in Preparation
• Focus on the Bayesian method described
in this paper
• Uses Gibbs sampler to simulate posterior
distribution of the results
• Complete algorithm coded in R
• Hope to increase population of actuaries
who:
– Understand what the method means
– Can actually use the method
Download