Loss reserving with GLMs: a case study Greg Taylor Gráinne McGuire

advertisement
Loss reserving with GLMs: a case study
Greg Taylor
Taylor Fry Consulting Actuaries
Melbourne University
University of New South Wales
Gráinne McGuire
Taylor Fry Consulting Actuaries
Casualty Actuarial Society, Spring Meeting
Colorado Springs CO, May 16-19 2004
1
Purpose
• Examine loss reserving in relation to a particular
data set
• How credible are chain ladder reserves?
• Are there any identifiable inconsistencies between the
data and the assumptions underlying the chain ladder
model?
• If so, do they really matter? Or are we just making an
academic mountain out of a molehill?
• Can the chain ladder model be conveniently adjusted to
eliminate any such inconsistencies?
• If not, what shall we do?
• Lessons learnt from this specific data set intended
to be of wider applicability
2
The data set
• Auto Bodily Injury insurance
• Compulsory
• No coverage of property damage
• Claims data relates to Scheme of insurance for one
state of Australia
• Pooled data for the entire state
• Scheme of insurance is state regulated but
privately underwritten
• Access to common law
• But some restriction on payment of plaintiff costs in the case of
smaller claims
• Premium rates partially regulated
3
The data set (continued)
•
•
•
•
Centralised data base for Scheme
Current at 30 September 2003
About 60,000 claims
Individual claim records
• Claim header file
• Date of injury, date of notification, injury type, injury severity,
etc
• Transaction file
• Paid losses (corrected for wage inflation)
• Case estimate file
4
Starting point for analysis
• Chain ladder
• This paper is not a vendetta against the
chain ladder
• However, it is taken as the point of
departure because of its
• Simplicity
• Wide usage
5
Chain ladder
• First, basic test of
100.00
Age-to-age factors
chain ladder validity
• Fundamental premise
of chain ladder is
constancy of expected
age-to-age factors
from one accident
period to another
Payments in respect of settled claims: age-to-age
factors for various averaging periods
10.00
1.00
1:0
2:1
3:2
4:3
5:4
6:5
7:6
8:7
9:8
10:9
Development quarters
• This data set fails the
Last 1 year
Last 2 years
Last 3 years
Last 4 years
All years
test comprehensively
6
Chain ladder – does the
instability matter?
Loss reserve at 30
• Range of
Sept 2003 (excl. Sept
variation is
2003 accident qr)
19%
• Omitting
$B
just the last
All experience quarters
1.61
quarter’s
Last 8 experience quarters
1.68
experience
increases
All experience quarters except
loss reserve
Sept 2003 (last diagonal)
1.78
by 10-15%
Last 8 experience quarters
except Sept 2003 (last
1.92
diagonal)
Averaging period
7
Chain ladder – does the
instability matter?
• Actually, the situation is much worse than this
• Effect of September 2003 quarter (last diagonal)
on loss reserve
• Due to low age-to-age factors in the quarter
• In turn due to low paid losses in the quarter
• Suggests
• Not only omitting September 2003 quarter age-to-age
factors from averaging
• But also recognising that loss reserve is increased by
low paid loss experience
• Estimate loss reserve at 30 June 2003
• Deduct paid losses during September 2003 quarter
8
Chain ladder – does the
instability matter?
Averaging period
All experience quarters except
Sept 2003 (last diagonal)
Last 8 experience quarters
except Sept 2003 (last
diagonal)
Loss reserve at 30 Sept
2003 (excl. Sept 2003
accident qr)
Uncorrected
Corrected
$B
1.78
$B
1.94
1.92
2.35
• Now 46%
difference
between
highest
estimate
and lowest
in previous
table
• More than
an
academic
molehill
9
Review basic facts and questions
• We have a model formulated on the assumption of certain
stable parameters (expected age-to-age factors)
• This assumption seems clearly violated
• Data contain clear trends over time
• Various attempts at correction for this
• Including different averaging periods
• Different corrections give widely differing loss reserves
• How might one choose the “appropriate” correction
• Omit just last quarter? Last two? …
• Including averaging period
• Average last 4 quarters? Last 6? Last 8? …
10
Some responses to the questions
• DO NOT choose an averaging period
• It is a statistical fundamental that one does not average in the
presence of trends
• Rather model the trend
• This requires an understanding of the mechanics of the
process generating the trend
• DO NOT try to use this understanding to assist in the
choice of an averaging period
• Rather use it to model the finer structure of the data
• Otherwise the choice of factors is little more than
numerology
• These comments apply to not only the chain ladder
• But also any “model” that ignores the fine structure of the data in
favour of averaging of some broad descriptive statistics
11
Effect on loss data of changes in
underlying process
• Consider a 21x21 paid loss
triangle from a fairly typical
Auto Bodily Injury portfolio
• Years numbered 0,1,2,…
• Experience of all accident years
identical
• Stable age-to-age factors
• Now assume that rates of
claim closure (by numbers)
increase by 50% in experience
years (diagonals) 11-15
• Examine the ratios of
“new:old” paid losses
• No change = 100%
12
Effect on loss data of changes in
underlying process (cont’d)
• Now add superimposed inflation of 5% p.a.
to experience years 14-20
13
Effect on loss data of changes in
underlying process (cont’d)
• Now add a legislative change that reduces claim costs in
accident years 13-20
• 50% reduction for the earliest claims settled
• 0% for the last 30% of claims settled
14
Effect on loss data of changes in
underlying process (cont’d)
• The ratio of modified experience to the norm
(stable age-to-age factors) is now complex
• Age-to-age factors now change in a complex
manner
• Trends across diagonals
• Further trends across rows
• Contention is that these trends will be identifiable
only by means of some form of structured and
rigorous multivariate data analysis
15
How might the loss data be
modelled?
Let
i = accident quarter
j = development quarter (=0,1,2,…)
Fij = incremental count of claims closed
CFij = incremental paid losses in respect
these closures
Sij = CFij / Fij = average size of these
closures
16
How might the loss data be
modelled? (cont’d)
• Modelling the loss data might consist of:
• Fitting some structured model to the average
claim sizes Sij
• Testing the validity of that model
• The use of average claim sizes will make
automatic correction for any changes in the
rates of claim closure
17
Modelling the loss data
• Very simple model
Sij ~ logN (βj,σ)
Log normal claim sizes depending on
development quarter
• Fit model to data using EMBLEM software
18
Dependency of average claim
size on development quarter
Linear Predictor
14
13
12
11
10
9
8
1
0
9
8
7
6
5
4
3
2
3
3
3
2
2
2
2
2
2
2
1
0
9
8
7
6
5
4
3
2
1
9
0
development quarter
2
2
2
2
1
1
1
1
1
1
1
1
1
1
8
7
6
5
4
3
2
1
0
7
19
Add superimposed inflation
• Define
k = i+j = calendar quarter of closure
• Extend model
Sij ~ logN (βdj+ βfk,σ)
Log normal claim sizes depending on
development quarter and closure quarter
(superimposed inflation)
20
ar
-9
Ju
7
nS 97
ep
-9
D 7
ec
-9
M 7
ar
-9
Ju 8
nS 98
ep
-9
D 8
ec
-9
M 8
ar
-9
Ju 9
nS 99
ep
-9
D 9
ec
-9
M 9
ar
-0
Ju 0
nS 00
ep
-0
D 0
ec
-0
M 0
ar
-0
Ju 1
nS 01
ep
-0
D 1
ec
-0
M 1
ar
-0
Ju 2
nS 02
ep
-0
D 2
ec
-0
M 2
ar
-0
Ju 3
nS 03
ep
-0
3
M
Dependency of average claim
size on closure quarter
8.9
Linear Predictor
8.8
8.7
8.6
8.5
8.4
8.3
8.2
• Some upward
trend with
closure
quarter
• Positive
superimposed
inflation
8.1
8.0
finalisation quarter
21
Modelling individual claim data
•
•
•
•
We could continue this mode of analysis
But why model triangulated data?
We have individual claim data
More natural to model individual claim
sizes
22
Notation for analysis of
individual claim sizes
•
•
•
•
Time variables i, j, k as before
Yr = size of r-th closed claim
ir, jr, kr are values of i, j, k for r-th closed claim
Also define
tr = operational time for r-th claim
= proportion of claims from accident
quarter ir closed before r-th claim
• Model
log Yr = fn(ir, jr, kr,tr) + stochastic error
23
Dependency on operational time
• Model
log Yr = fn(ir, jr, kr,tr) + stochastic error
• Specifically
log Yr ~ N(fn(tr), σ)
• Divide range of tr (0-100%) into 2% bands
24
Dependency on operational time
Linear Predictor
11.5
• Dependency
close to linear
over much of
the range of
operational time
11.0
10.5
10.0
9.5
9.0
8.5
8.0
7.5
optime
25
Dependency on calendar quarter of
closure (superimposed inflation)
Linear Predictor
8.40
• Some upward
trend with
closure quarter
• Positive
superimposed
inflation
8.35
8.30
8.25
8.20
8.15
8.10
8.05
8.00
7.95
S
Ju
n97
ep
-9
D 7
ec
-9
M 7
ar
-9
Ju 8
n9
S 8
ep
-9
D 8
ec
-9
M 8
ar
-9
9
Ju
n99
S
ep
-9
D 9
ec
-9
M 9
ar
-0
0
Ju
n0
S 0
ep
-0
D 0
ec
-0
M 0
ar
-0
Ju 1
n0
S 1
ep
-0
D 1
ec
-0
M 1
ar
-0
Ju 2
n0
S 2
ep
-0
D 2
ec
-0
M 2
ar
-0
3
Ju
n03
S
ep
-0
3
7.90
finalisation quarter
26
Log normal assumption?
• Examine
Pearson Residuals
10,000
9,000
residuals of log
normal model
8,000
7,000
6,000
5,000
4,000
3,000
2,000
• Considerable left
skewness
1,000
0
Pearson Residuals
-8
-7
-6
-5
-4
-3
-2
-1
0
5
5
4
4
3
3
2
2
1
1
0
0
-1
-1
-2
-2
-3
-3
-4
-4
-5
-5
-6
-6
-7
-7
-8
6.0
6.5
7.0
7.5
8.0
8.5
9.0
Fitted Value
9.5
10.0
10.5
11.0
11.5
-8
7.0
1
2
3
4
5
Largest 1,000 Pearson Residuals
7.5
8.0
8.5
9.0
9.5
Fitted Value
10.0
10.5
11.0
11.5
27
Alternative error distribution
• Choose shorter tailed
Studentized Standardized Deviance Residuals
16,000
14,000
distribution from the
family underlying GLMs
• Exponential dispersion
family
• We choose EDF(2.3)
V[Yr] = φ {E [Yr]}2.3
• Longer tailed than gamma
• Shorter than log normal
12,000
10,000
8,000
6,000
4,000
2,000
0
-8
-6
-4
-2
0
2
4
6
8
Largest 100 Studentized Standardized Deviance Residuals
8
6
4
2
0
-2
-4
-6
28
-8
0
20,000
40,000
60,000
80,000
100,000
120,000
Fitted Value
140,000
160,000
180,000
200,000
Refining the model of the data
• …and so on
• We continue
to refine the
model of
claim size
• Paper
contains
detail
• Final model includes following
effects
• Operational time (smoothed)
• Seasonal
• Superimposed inflation (smoothed)
• Different rates at different operational times
• Different rates over different intervals of
calendar time
• Accident quarter (legislative) effect
• Diminishes with increasing operational time
• Peters out at operational time 35%
29
Final estimate of liability
Averaging period
Loss reserve at 30 Sept 2003 (excl.
Sept 2003 accident qr)
Uncorrected
Corrected
$B
$B
All experience quarters
1.61
Last 8 experience quarters
1.68
All experience quarters except Sept 2003 (last
diagonal)
1.78
1.94
1.92
2.35*
Last 8 experience quarters except Sept 2003
(last diagonal)
GLM
2.23*
* Quite different distributions over accident years
30
Conclusions
• GLM has successfully modelled a loss experience with
considerable complexity
• Simpler model structures, e.g.chain ladder, would have
little hope of doing so
• Indeed, it is not even clear how one would approach the problem
with these simpler structures
• The GLM achieves much greater parsimony
• Chain ladder number of parameters = 73 with no recognition of
any trends
• GLM number of parameters = 13 with full recognition of trends
• GLM is fully stochastic
• Provides a set of diagnostics for comparing candidate models and
validating a selection
• Understanding of the data set
• Assists not only reserving but pricing and other decision making
31
Download