Human Dimension Deck v3 - ASE

advertisement
The Future of Loss Reserving
A Bayesian 21st Century
R for Insurance Conference
Amsterdam
June 28, 2015
Jim Guszcza, PhD, FCAS
Deloitte Consulting LLP
jguszcza@deloitte.com
2
Deloitte Analytics Institute
© 2011 Deloitte LLP
Motivation:
Why Bayes, Why Now
Probably what we want
“Given any value (estimate of future payments) and our current
state of knowledge, what is the probability that the final payments
will be no larger than the given value?”
-- Casualty Actuarial Society (2004)
Working Party on Quantifying Variability in Reserve Estimates
I read this as a request for a Bayesian predictive distribution.
4
Deloitte Analytics Institute
© 2011 Deloitte LLP
Bayes gives us what we want
“Modern Bayesian methods provide richer information, with greater
flexibility and broader applicability than 20th century methods.
Bayesian methods are intellectually coherent and intuitive. Bayesian
analyses are readily computed with modern software and hardware.”
-- John Kruschke, Indiana University Psychology
5
Deloitte Analytics Institute
© 2011 Deloitte LLP
Why Bayes
• “A coherent integration of evidence from different sources”
• Background information
• Expert knowledge / judgment (“subjectivity” is a feature, not a bug)
• Other datasets (e.g. multiple triangles)
• Shrinkage, “borrowing strength”, hierarchical model structure – all coin of the realm
6
Deloitte Analytics Institute
© 2011 Deloitte LLP
Why Bayes
• “A coherent integration of evidence from different sources”
• Background information
• Expert knowledge / judgment (“subjectivity” is a feature, not a bug)
• Other datasets (e.g. multiple triangles)
• Shrinkage, “borrowing strength”, hierarchical model structure – all coin of the realm
• Rich output: full probability distribution estimates of all quantities of interest
• Ultimate loss ratios by accident year
• Outstanding loss amounts
• Missing values of any cell in a loss triangle
7
Deloitte Analytics Institute
© 2011 Deloitte LLP
Why Bayes
• “A coherent integration of evidence from different sources”
• Background information
• Expert knowledge / judgment (“subjectivity” is a feature, not a bug)
• Other datasets (e.g. multiple triangles)
• Shrinkage, “borrowing strength”, hierarchical model structure – all coin of the realm
• Rich output: full probability distribution estimates of all quantities of interest
• Ultimate loss ratios by accident year
• Outstanding loss amounts
• Missing values of any cell in a loss triangle
• Model the process that generates the data
• As opposed to modeling the data with “procedural” methods
• We can fit models as complex (or simple) as the situation demands
• Nonlinear growth patterns, trends, autoregressive, hierarchical, structure, …
8
Deloitte Analytics Institute
© 2011 Deloitte LLP
Why Bayes
• “A coherent integration of evidence from different sources”
• Background information
• Expert knowledge / judgment (“subjectivity” is a feature, not a bug)
• Other datasets (e.g. multiple triangles)
• Shrinkage, “borrowing strength”, hierarchical model structure – all coin of the realm
• Rich output: full probability distribution estimates of all quantities of interest
• Ultimate loss ratios by accident year
• Outstanding loss amounts
• Missing values of any cell in a loss triangle
• Model the process that generates the data
• As opposed to modeling the data with “procedural” methods
• We can fit models as complex (or simple) as the situation demands
• Nonlinear growth patterns, trends, autoregressive, hierarchical, structure, …
• Conceptual clarity
• Single-case probabilities make sense in the Bayesian framework
• Communication of risk: “mean what you say and say what you mean”
9
Deloitte Analytics Institute
© 2011 Deloitte LLP
Today’s Bayes
Is our industry
living up to its
rich Bayesian
heritage?
10
Deloitte Analytics Institute
© 2011 Deloitte LLP
Bayesian Principles
The Fundamental Bayesian Principle
“For Bayesians as much as for any other statistician, parameters are
(typically) fixed but unknown. It is the knowledge about these
unknowns that Bayesians model as random…
… typically it is the Bayesian who makes the claim for inference in a
particular instance and the frequentist who restricts claims to
infinite populations of replications.”
-- Andrew Gelman and Christian Robert
12
Deloitte Analytics Institute
© 2011 Deloitte LLP
The Fundamental Bayesian Principle
“For Bayesians as much as for any other statistician, parameters are
(typically) fixed but unknown. It is the knowledge about these
unknowns that Bayesians model as random…
… typically it is the Bayesian who makes the claim for inference in a
particular instance and the frequentist who restricts claims to
infinite populations of replications.”
-- Andrew Gelman and Christian Robert
Translation:
• Frequentist: Probability models the infinite replications of the data X
• Bayesian:
13
Deloitte Analytics Institute
Probability models our partial knowledge about 𝜃
© 2011 Deloitte LLP
Updating Subjective Probability
• Bayes’ Theorem (a mathematical fact):
Pr( H  E ) Pr( E | H ) Pr( H )
Pr( H | E ) 

Pr( E )
Pr( E )
• Bayes’ updating rule (a methodological premise):
• Let P(H) represents our belief in hypothesis H before receiving evidence E.
• Let P*(H) represent our belief about H after receiving evidence E.
• Bayes Rule: P*(H) = Pr(H|E)
Pr( H )  Pr( H | E )
E
14
Deloitte Analytics Institute
© 2011 Deloitte LLP
Learning from data
Suppose Persi tosses a coin 12 times and gets 3 heads.
What is the probability of heads on the 13th toss?
15
Deloitte Analytics Institute
© 2011 Deloitte LLP
Learning from data
Suppose Persi tosses a coin 12 times and gets 3 heads.
What is the probability of heads on the 13th toss?
Frequentist analysis
𝑋𝑖 ~𝑖𝑖𝑑 𝐵𝑒𝑟𝑛 𝜃  𝐿 𝜃|𝐻 = 3, 𝑛 = 12 =
16
Deloitte Analytics Institute
𝜃3 1 − 𝜃
9
 𝜃𝑀𝐿𝐸 = 14
© 2011 Deloitte LLP
Learning from data
Suppose Persi tosses a coin 12 times and gets 3 heads.
What is the probability of heads on the 13th toss?
Frequentist analysis
𝑋𝑖 ~𝑖𝑖𝑑 𝐵𝑒𝑟𝑛 𝜃  𝐿 𝜃|𝐻 = 3, 𝑛 = 12 =
𝜃3 1 − 𝜃
9
 𝜃𝑀𝐿𝐸 = 14
Thoughts
• “Parameter risk”: 12 flips is not a lot of data (“credibility concerns”)
• We’ve flipped other coins before… isn’t that knowledge relevant?
• It would be nice to somehow “temper” the estimate of ¼ or “credibility weight”
it with some other source of information
• It would be nice not to just give a point estimate and a confidence interval,
but say things like: 𝑃𝑟 𝐿 < 𝜃 < 𝑈 = 𝑝
17
Deloitte Analytics Institute
© 2011 Deloitte LLP
Learning from data
Suppose Persi tosses a coin 12 times and gets 3 heads.
What is the probability of heads on the 13th toss?
Bayesian analysis
𝜃~𝐵𝑒𝑡𝑎 𝛼, 𝛽  𝜃~𝐵𝑒𝑡𝑎 𝛼 + 3, 𝛽 + 9
18
Deloitte Analytics Institute
© 2011 Deloitte LLP
Learning from data
Suppose Persi tosses a coin 12 times and gets 3 heads.
What is the probability of heads on the 13th toss?
Bayesian analysis
𝜃~𝐵𝑒𝑡𝑎 𝛼, 𝛽  𝜃~𝐵𝑒𝑡𝑎 𝛼 + 3, 𝛽 + 9
Thoughts
• “Parameter risk”: quantified by the posterior distribution
• Prior knowledge: encoded in the choice of {𝛼, 𝛽}
• Other data: maybe Persi has flipped other coins on other days… we could
throw all of this (together with our current data) into a hierarchical model
• Mean what we say and say what we mean: 𝑃𝑟 𝐿 < 𝜃 < 𝑈 = 𝑝 is a
“credibility interval”… it’s what most people think confidence intervals say…
(but don’t!)
19
Deloitte Analytics Institute
© 2011 Deloitte LLP
Prior distributions: a feature, not a bug
“Your ‘subjective’ probability is not something fetched out of the
sky on a whim; it is what your actual judgment should be, in view
of your information to date and other people’s information.”
-- Richard Jeffrey, Princeton University
20
Deloitte Analytics Institute
© 2011 Deloitte LLP
Prior distributions: a feature, not a bug
“Your ‘subjective’ probability is not something fetched out of the
sky on a whim; it is what your actual judgment should be, in view
of your information to date and other people’s information.”
-- Richard Jeffrey, Princeton University
• “Subjective” probability is really “judgmental” probability
• The choice of likelihood function is also “subjective” In this sense
• ODP (or other) distributional form
• Inclusion of covariates
• Trends
• Tail factor extrapolations
• ….
21
Deloitte Analytics Institute
© 2011 Deloitte LLP
Bayesian Computation
An intractable problem
Before 1990: this sort of thing was often viewed as a parlor trick
because of the need to analytically solve high-dimensional integrals:
f (Y | X )   f (Y |  ) f ( | X )d  
 f ( X |  ) ( ) 
d
f (Y |  )
 f ( X |  ) ( )d 


f ( | X ) 
23
Deloitte Analytics Institute
f ( X |  ) ( )
 f ( X |  ) ( )d
© 2011 Deloitte LLP
Why Everyone Wasn’t a Bayesian
Before 1990: this sort of thing was often viewed as a parlor trick
because of the need to analytically solve high-dimensional integrals:
f (Y | X )   f (Y |  ) f ( | X )d  
24
Deloitte Analytics Institute
 f ( X |  ) ( ) 
d
f (Y |  )
 f ( X |  ) ( )d 


© 2011 Deloitte LLP
MCMC makes it practical
After 1990: The introduction of Markov Chain Monte Carlo
[MCMC] simulation to Bayesian practice introduces a
“new world order”:
Now we can simulate Bayesian posteriors.
25
Deloitte Analytics Institute
© 2011 Deloitte LLP
Chains we can believe in
We set up random walks through parameter space that… in the limit… pass
through each region in the probability space in proportion to the posterior
probability density of that region.
• How the Metropolis-Hastings sampler generates a Markov chain {1, 2, 3,… }:
1.
Time t=1: select a random initial position 1 in parameter space.
2.
Select a proposal distribution p() that we will use to select proposed random steps away from our
current position in parameter space.
3.
Starting at time t=2: repeat the following until you get convergence:
a)
b)
c)
At step t, generate a proposed *~p()
Also generate u ~ unif(0,1)
If R > u then t= *. Else, t= t-1.
f ( * | X ) p ( t 1 |  * )
R

f ( t 1 | X ) p ( * |  t 1 )
Step (3c) implies that at step t, we accept the proposed step * with
probability min(1,R).
26
Deloitte Analytics Institute
© 2011 Deloitte LLP
Let’s go to the Metropolis
• So now we have something we can easily program into a computer.
• At each step, give yourself a coin with probability of heads min(1,R) and flip it.
f ( X |  * ) ( * ) p ( t 1 |  * )
R

f ( X |  t 1 ) ( t 1 ) p ( * |  t 1 )
• If the coin lands heads move from t-1 to *
• Otherwise, stay put.
• The result is a Markov chain (step t depends only on step t-1… not on prior
steps). And it converges on the posterior distribution.
27
Deloitte Analytics Institute
© 2011 Deloitte LLP
Random walks with 4 different starting points
• We estimate the lognormal
density using 4 separate sets
of starting values.
10
8
2
0
5
10
15
0
mu
sigma
6
4
sigma
6
sigma
6
4
0

5
10
15
10
15
mu
10
ln( x)  
8
, z
2

5
2
3
4
5
2
5
2
4
6
2
24
3
4
2
8
x 2

exp  z
0
1
sigma
f ( x |  , ) 
5
10
• Data: 50 random draws from
lognormal(9,2).
sigma
8
2
34
4
10
First 5 Metropolis-Hastings Steps
0
0
3
2
0
28
Deloitte Analytics Institute
5
10
mu
15
mu
0
5
mu
© 2011 Deloitte LLP
Random walks with 4 different starting points
• After 10 iterations, the lower
right chain is already in the
right neighborhood.
10
8
4
6
8
7
5
6 9
10
0
0
0
5
10
15
0
5
10
15
10
6
2
3
5
7 8
6
2
2
7
6
5
4
10
9
0
0
3
2
0
Deloitte Analytics Institute
15
mu
4
sigma
6
4
10
9
8
4
29
10
8
10
mu
8
sigma
sigma
24
3
2
4
10
sigma
10
8
7 85
6
9
6
2
34
2
sigma
First 10 Metropolis-Hastings Steps
5
10
mu
15
mu
0
5
mu
© 2011 Deloitte LLP
Random walks with 4 different starting points
10
8
6
4
8
7
5
6 9
1110
12
13
14
15
16
17
18
19
20
0
0
0
5
10
15
0
5
10
10
mu
6
2
3
4
sigma
6
4
4
5
7 8
6
20
2
7
6
5
13
16
18
17
12
11
10
919
8
15
14
2
4
12
11
10
18
17
16
15
14
13
920
19
0
0
3
2
0
30
Deloitte Analytics Institute
15
8
10
mu
8
sigma
sigma
24
3
2
4
10
11
12
13
14
16
15
17
18
19
20
sigma
10
7 85
6
9
6
2
34
8
First 20 Metropolis-Hastings Steps
2
sigma
• After 20 iterations, only the 3rd
chain is still in the wrong
neighborhood.
5
10
mu
15
mu
0
5
10
mu
15
© 2011 Deloitte LLP
Random walks with 4 different starting points
10
10
8
6
0
5
10
15
10
mu
6
4
2
3
4
5
7 8
6
49
50
25
24
12
11
10
18
17
16
15
14
13
9
23
22
148
21
20
9
39
42
41
40
38
37
36
31
30
29
35
34
33
32
47
46
45
44
43
28
27
26
0
0
2
2324
25
2930
31
32
2826
27
22
34
33
20
21 42
41
40
39
38
37
36
35
47
46
45
44
43
49
48
50
2
7
6
5
13
16
18
17
12
11
10
919
8
15
14
sigma
8
10
6
4
sigma
25
2324
2930
32
2831
27
26
3433
22
20
3635
21
49
50
42
41
40
39
3843
37
47
46
45
44
48
4
15
mu
3
2
0
Deloitte Analytics Institute
8
7
5
6 9
1110
12
13
14
15
16
17
18
19
0
0
5
8
sigma
0
4
31
24
3
2
4
10
11
12
13
14
16
15
1743
44
49
50
31
30
18
129
20
932
28
27
26
25
24
47
46
45
48
23
22
21
42
41
40
39
3833
37
36
35
34
sigma
10
7 85
6
9
6
2
34
8
First 50 Metropolis-Hastings Steps
2
sigma
• After 50 iterations, all 4 chains
have arrived in the right
neighborhood.
5
10
mu
15
mu
0
5
10
mu
15
© 2011 Deloitte LLP
Random walks with 4 different starting points
• By 500 chains, it appears
that the burn-in has long
since been accomplished.
5
10
10
8
6
25
2324
2930
32
2831
27
26
3433
22
20
36
35
315
314
313
312
21 393
394
388
392
391
390
389
310
309
308
307
311
301
300
49
52
51
123
122
180
88
87
86
85
84
451
450
449
442
60
141
140
139
138
137
231
230
229
228
227
364
59
441
440
472
302
462
461
460
459
458
457
456
473
474
463
439
120
119
395
396
401
400
399
407
406
405
404
62
420
419
418
417
416
415
398
397
409
408
125
248
247
421
70
69
403
68
67
66
65
64
63
4
410
02
267
266
265
479
478
477
218
269
268
136
476
475
270
467
466
465
124
217
183
191
190
112
111
7980
78
185
184
447
117
116
446
445
444
443
293
232
495
332
83
82
81
115
110
249
306
305
75
74
189
188
187
186
448
114
494
109
108
118
113
750
77
6
412
95
147
303
151
150
143
142
469
468
304
411
146
145
144
452
455
454
453
149
94
1
48
225
224
223
358
357
356
257
256
258
363
362
361
360
359
365
204
203
202
201
200
107
242
237
236
235
106
105
134
133
243
216
215
214
213
212
211
2
241
240
239
238
73
209
208
207
206
72
71
210
05
167
166
165
263
262
261
260
259
42
41
40
39
38
37
245
430
429
428
427
244
246
437
436
435
434
433
432
431
47
46
45
44
43
282
281
48
438
264
102
101
100
178
177
176
175
174
99
98
97
96
104
103
162
161
160
179
355
354
413
255
254
253
252
251
173
414
163
299
298
297
296
295
345
294
371
370
377
376
380
379
367
366
369
368
375
374
373
372
498
497
496
3
352
351
350
349
348
347
500
499
378
353
46
326
132
471
156
155
154
153
322
321
320
319
318
317
324
152
323
331
330
329
328
327
336
335
334
333
387
470
131
130
129
128
127
126
325
164
93
92
91
90
89
168
199
195
194
193
192
158
157
485
484
483
482
481
480
234
233
491
490
489
488
56
159
197
170
196
198
493
492
172
171
487
486
339
344
343
342
341
340
61
383
382
381
287
286
285
284
283
316
292
423
422
135
291
290
289
288
338
337
57
55
250
226
58
54
53
121
273
272
271
464
182
181
386
385
384
280
279
278
277
276
275
274
426
425169
424
222
221
220
219
4
15
0
5
10
15
6
8
10
mu
4
2
3
4
5
7 8
6
315
314
313
312
394
393
388
392
391
390
389
310
309
308
307
311
301
300
123
122
180
451
450
449
49
52
51
50
442
141
140
139
138
137
231
230
229
228
227
364
441
440
472
302
462
461
460
459
458
457
456
473
60
474
463
88
87
86
85
84
439
120
119
59
8
92
91
90
96
395
396
400
399
401
407
406
405
404
398
397
409
408
421
420
419
418
417
416
415
25
24
125
248
247
403
410
4
02
267
266
265
479
478
477
218
269
268
69
68
67
66
65
64
63
62
136
476
475
270
467
466
465
124
217
70
183
191
190
112
111
185
184
447
117
116
446
445
444
443
293
232
495
332
110
115
249
306
305
189
188
187
186
448
114
494
109
108
118
113
95
94
147
303
7980
78
151
150
143
142
469
468
83
82
81
304
412
411
146
145
144
452
75
74
455
454
453
149
1
93
7
77
48
225
224
223
358
357
356
257
256
258
363
362
361
360
359
365
204
203
202
201
200
107
242
237
236
235
106
105
134
133
243
216
215
214
213
212
211
2
241
240
239
238
209
208
207
206
210
05
263
262
261
260
259
427
12
11
10
245
430
429
428
18
17
16
15
14
13
244
246
437
436
435
434
433
432
431
9
23
22
282
281
1
21
20
73
72
71
9
438
167
166
165
264
102
101
100
178
177
176
175
42
41
40
39
99
98
97
96
38
37
36
104
103
162
161
160
179
355
354
31
30
29
255
254
253
252
251
413
35
34
33
32
47
46
45
44
43
163
48
414
299
298
297
296
295
345
174
294
371
370
377
376
380
379
367
366
369
368
375
374
373
372
498
497
496
3
352
351
350
349
348
347
500
499
173
378
353
46
326
132
471
156
155
154
153
322
321
320
319
318
317
324
152
323
331
330
329
328
327
336
335
334
333
131
130
129
128
387
470
127
126
325
164
168
199
195
194
193
192
158
157
485
484
483
482
481
480
234
233
491
490
489
488
159
197
170
196
198
493
492
172
171
487
486
169
339
344
343
342
341
340
423
422
383
382
381
287
286
285
284
283
316
292
135
291
290
289
288
338
337
250
226
121
273
272
271
61
464
28
27
26
182
181
57
56
58
426
386
385
384
280
279
278
277
276
275
274
55
54
53
425
424
222
221
220
219
0
0
2
2324
25
2930
31
32
2826
27
22
34
33
20 442
388
389
462
461
317
463
451
450
449
316
315
314
313
441
440
439
456
312
420
231
230
229
228
227
419
418
417
416
415
88
87
86
85
84
421
364
120
119
8
92
91
90
96
472
473
151
150
474
149
21 296
183
447
446
445
444
443
311
310
309
185
184
180
248
247
448
308
307
56
181
270
269
268
267
266
265
112
111
42
218
479
478
477
69
117
116
68
67
66
65
64
63
62
61
60
476
475
41
40
39
115
460
459
38
37
36
35
110
217
249
452
454
453
57
55
114
47
46
45
44
43
109
108
118
59
58
70
113
49
48
54
53
52
51
455
95
94
332
400
399
398
397
396
495
191
409
408
407
406
405
404
403
402
401
190
189
232
494
410
350
349
348
93
433
432
7980
78
437
436
435
434
83
82
81
75
74
351
438
750
77
261262
260
259
107
225
224
223
106
105
430
429
428
427
257
256
258
431
263
156
155
154
153
242
214
213
212
211
237
236
235
205
204
203
202
201
200
152
358
209
208
207
206
243
210
413
412
411
216
215
241
240
239
238
357
356
363
362
361
360
359
365
329
414
264
299
298
297
286
295
283
302
301
300
157
294
486
245
287
488
282
281
293
292
291
244
246
285
284
290
289
288
491
487
73
72
71
490
489
467
102
101
100
147
99
98
97
96
104
103
143
142
141
140
139
138
255
254
253
137
252
251
322
321
320
319
318
324
355
354
146
145
144
323
466
465
464
1
325
48
395
346
345
377
376
375
374
373
372
371
370
380
379
367
366
369
423
422
368
394
393
347
352
378
353
167
166
165
469
468
303
471
470
331
330
336
387
498
497
496
136
135
134
133
335
334
333
306
500
499
305
304
123
122
234
485
484
483
482
481
480
197
273
196
195
194
193
192
199
198
162
161
160
188
187
186
233
174
173
426
179
178
177
176
175
458
457
272
271
493
492
182
121
163
339
338
337
344
343
274
342
341
340
425
424
250
226
164
158
383
382
381
159
170
169
168
171
280
279
278
275
386
385
384
172
132
131
130
129
128
127
126
125
277
276
392
391
390
124
327
326
328
222
221
220
219
2
7
6
5
13
16
18
17
12
11
10
919
8
15
14
sigma
4
6
8
10
mu
3
2
0
Deloitte Analytics Institute
8
7
5
6 9
1110
12
13
14
15
16
17
18
19
0
0
0
4
32
24
3
2
4
10
11
12
13
14
16
15
17
388
44
43
389
317
316
315
314
313
312
311
310
309
49
52
51
50
31
30
29
32
364
442
60
231
230
229
228
227
88
87
86
85
84
451
450
449
120
119
59
8265
92
91
90
96
18
441
440
439
151
150
332
270
269
268
267
266
149
183
472
69
1
20
68
67
66
65
64
63
62
185
184
473
180
474
248
247
462
461
460
459
458
457
456
308
307
9
181
70
463
112
111
218
117
116
400
399
398
397
396
479
478
477
409
408
407
406
405
404
403
402
401
420
419
418
417
416
415
115
476
475
110
217
249
114
421
109
108
118
410
113
467
95
94
191
190
189
232
28
27
26
25
24
466
465
93
7980
78
83
82
81
412
411
448
447
446
445
444
443
75
74
363
362
361
365
748
77
261262
260
259
107
469
468
225
224
223
237
236
235
106
105
354
257
256
452
238
258
455
454
453
263
345
156
155
154
153
242
214
213
212
211
205
204
203
202
201
200
152
209
208
207
206
243
210
216
215
241
240
239
3
347
73
352
351
350
349
348
72
71
353
46
264
299
298
297
296
286
295
102
101
100
326
147
157
99
98
97
96
283
294
302
301
300
336
437
486
104
103
245
322
321
320
319
318
143
142
141
140
139
138
287
324
491
490
489
488
495
244
246
282
281
293
292
255
254
253
285
284
291
290
289
288
323
433
432
137
252
251
331
330
329
328
327
335
334
333
498
497
496
146
145
144
325
436
435
434
494
493
492
500
499
47
46
45
487
1
438
48
367
374
373
372
371
370
369
395
380
379
378
377
376
368
366
431
430
429
428
427
375
394
393
344
343
342
339
338
337
341
340
387
23
22
413
21
414
167
166
165
306
471
470
305
304
303
136
135
134
133
123
122
234
197
273
485
484
483
482
481
480
196
195
194
193
192
199
198
383
382
381
162
161
160
188
187
186
233
174
173
179
178
177
176
175
272
271
182
121
163
274
61
358
357
356
423
422
250
226
57
56
360
359
58
392
391
164
390
170
169
168
158
171
386
385
384
172
159
55
54
53
280
279
278
277
276
275
355
426
132
131
130
129
128
127
126
125
464
42
41
40
39
38
37
36
425
424
124
35
34
33
222
221
220
219
sigma
10
8
7 85
6
9
6
2
34
2
sigma
sigma
The time the chain
spends in a neighborhood
approximates the
posterior probability that
(,) lies in this
neighborhood.
sigma
• The chain continues to
wander.
First 500 Metropolis-Hastings Steps
5
10
mu
15
mu
0
5
10
mu
15
© 2011 Deloitte LLP
In 3D
Recall the true lognormal
parameters are:
=9 and =2
33
Deloitte Analytics Institute
© 2011 Deloitte LLP
Metropolis-Hastings results
Metropolis-Hastings Simulation of Lognormal(9,2)
8.0
9.0
mh$mu
0.5
0.0
10.0
1.5
mu
1.0
• The MH simulation is gives
consistent results:
8.0
8.5
9.0
9.5
10.0
0
2000
coda$mu
2.0
10
sigma
Deloitte Analytics Institute
8
6
2
4
mh$sigma
1.0
0.5
0.0
1.5
34
10000
Index
1.5
• Only the final 5000 of the 10000
MH iterations were used to
estimate ,
6000
2.0
2.5
coda$sigma
3.0
0
2000
6000
Index
10000
© 2011 Deloitte LLP
Metropolis-Hastings results
Metropolis-Hastings Simulation of Lognormal(9,2)
8.0
9.0
mh$mu
0.5
0.0
10.0
1.5
mu
1.0
Note the very rapid
convergence despite
unrealistic initial values.
8.0
8.5
9.0
9.5
10.0
0
2000
coda$mu
6000
10000
Index
0.0
1.5
35
Deloitte Analytics Institute
6
4
2
0.5
1.0
mh$sigma
8
1.5
2.0
10
sigma
2.0
2.5
coda$sigma
3.0
0
2000
6000
Index
10000
© 2011 Deloitte LLP
An easier way to get the same result
Call JAGS from within R
Density of mu
0.0
8.5
0.4
9.0
0.8
9.5
1.2
10.0
Trace of mu
8000
10000
8.0
8.5
9.0
9.5
10.0
Iterations
N = 500 Bandwidth = 0.06648
Trace of tau
Density of tau
0
0.10
2
0.20
4
0.30
6
0.40
8
6000
6000
36
Deloitte Analytics Institute
8000
Iterations
10000
0.1
0.2
0.3
0.4
N = 500 Bandwidth = 0.01229
© 2011 Deloitte LLP
Bayesian Loss Reserving
Methodology: sophisticated simplicity
“It is fruitful to start simply and complicate if necessary. That
is, it is recommended that an initial, sophisticatedly simple model be
formulated and tested in terms of explaining past data and in
forecasting or predicting new data. If the model is successful… it can
be put into use. If not, [it] can be modified or elaborated to
improve performance…”
-- Arnold Zellner, The University of Chicago
38
Deloitte Analytics Institute
© 2011 Deloitte LLP
Methodology: sophisticated simplicity
“It is fruitful to start simply and complicate if necessary. That
is, it is recommended that an initial, sophisticatedly simple model be
formulated and tested in terms of explaining past data and in
forecasting or predicting new data. If the model is successful… it can
be put into use. If not, [it] can be modified or elaborated to
improve performance…”
-- Arnold Zellner, The University of Chicago
This is precisely what Bayesian Data Analysis enables us to do!
39
Deloitte Analytics Institute
© 2011 Deloitte LLP
Methodology: sophisticated simplicity
“It is fruitful to start simply and complicate if necessary. That
is, it is recommended that an initial, sophisticatedly simple model be
formulated and tested in terms of explaining past data and in
forecasting or predicting new data. If the model is successful… it can
be put into use. If not, [it] can be modified or elaborated to
improve performance…”
-- Arnold Zellner, The University of Chicago
Start with a simple model and then add structure to account for:
• Other distributional forms (what’s so sacred about GLM or exponential family??)
• Negative incremental incurred losses
• Nonlinear structure (e.g. growth curves)
• Hierarchical structure (e.g. fitting multiple lines, companies, regions)
• Prior knowledge
• Other loss triangles (“complement of credibility”)
• Calendar/accident year trends
• Autocorrelation
• …
40
Deloitte Analytics Institute
© 2011 Deloitte LLP
Background: hierarchical modeling from A to B
• Hierarchical modeling is used when one’s data is grouped in some important
way.
• Claim experience by state or territory
• Workers Comp claim experience by class code
• Claim severity by injury type
• Churn rate by agency
• Multiple years of loss experience by policyholder.
• Multiple observations of a cohort of claims over time
• Often grouped data is modeled either by:
• Building separate models by group
• Pooling the data and introducing dummy variables to reflect the groups
• Hierarchical modeling offers a “middle way”.
• Parameters reflecting group membership enter one’s model through appropriately specified
probability sub-models.
41
Deloitte Analytics Institute
© 2011 Deloitte LLP
Common hierarchical models
Yi    X i   i
• Classical linear model
• Equivalently:
Yi ~ N(+Xi, 2)
• Same ,  for each data point
• Random intercept model
• Where:
Yi ~ N(j[i]+Xi, 2)
• And:
j ~ N(, 2)
Yi   j[i ]  X i   i
• Same  for each data point; but  varies by group j
• Random intercept and slope model
• Both  and  vary by group

Yi ~ N  j[i ]   j[i ]  X i , 
42
Deloitte Analytics Institute
2

Yi   j[i ]   j[i ] X i   i
    
 j 
where   ~ N   ,  
 

j 
  
  2
, 
 
  

 2 
© 2011 Deloitte LLP
Simple example: policies in force by region
• Simple example: Change in
PIF by region from 2007-10
PIF Growth by Region
region1
• 32 data points
region2
region3
region4
2600
• 4 years
• But we could as easily have
80 or 800 regions
• Our model would not
change
pif
pif
pif
2400
pif
• 8 regions
2200
2000
region5
region6
region7
region8
year
year
year
year
2600
pif
pif
pif
2400
pif
• We view the dataset as a
bundle of very short time
series
2200
2000
2007
43
Deloitte Analytics Institute
2008 2009
year
2010
2007
year
2008 2009
year
2010
year
© 2011
Deloitte LLP
Classical linear model
Yi ~ N (  X i ,  2 )
• Option 1: the classical
linear model
PIF Growth by Region
region1
pif
pif
• Just throw all of the data into
one pot and regress
region4
2400
pif
• Don’t reflect region in the model
design
region3
2600
pif
• Complete Pooling
region2
2200
• Same  and  for each
region.
2000
region5
region6
region7
region8
year
year
year
year
2600
• This obviously doesn’t cut it.
pif
pif
pif
2400
pif
• But filling 8 separate regression
models or throwing in regionspecific dummy variables isn’t
an attractive option either.
2200
• Danger of over-fitting
• i.e. “credibility issues”
2000
2007
44
Deloitte Analytics Institute
2008 2009
year
2010
2007
year
2008 2009
year
2010
year
© 2011
Deloitte LLP
Randomly varying intercepts
Yi ~ N ( j[i ]  X i ,  2 )
• Option 2: random intercept
model
PIF Growth by Region
2400
2200
2000
region5
region6
region7
region8
year
year
year
year
2600
pif
pif
pif
2400
pif
• A major improvement
region4
pif
• This model has 9
parameters:
{1, 2, …, 8, }
region3
pif
2600
pif
• Yi = j[i] + Xi + i
region2
pif
region1
• And it contains 4
hyperparameters:
{, , , }
 j ~ N ( ,  2 )
2200
2000
2007
45
Deloitte Analytics Institute
2008 2009
year
2010
2007
year
2008 2009
year
2010
year
© 2011
Deloitte LLP
Randomly varying intercepts and slopes

Yi ~ N  j[i ]   j[i ]  X i , 
• Option 3: the random slope
and intercept model

    
 j 
where   ~ N   ,  
 

j 
  
  2
, 
 
  

 2 
PIF Growth by Region
2400
region4
pif
• This model has 16
parameters:
{1, 2, …, 8,
1, 2,…, 8}
region3
pif
2600
pif
• Yi = j[i] +  j[i] Xi + i
region2
pif
region1
• (note that 8 separate models
also contain 16 parameters)
2
2200
2000
region5
region6
region7
region8
year
year
year
year
2600
pif
pif
pif
2400
pif
• And it contains 6
hyperparameters:
{, , , , , }
2200
It’d be the same number of
hyperparameters if we had 80
or 800 regions
46
Deloitte Analytics Institute
2000
2007
2008 2009
year
2010
2007
year
2008 2009
year
2010
year
© 2011
Deloitte LLP
A compromise between complete pooling and no pooling
PIF  
PIF    t  
Complete Pooling
k
  kt   k

k 1, 2,.., 8
No Pooling
• Ignore group
structure altogether
• Estimate a separate
model for each group
Compromise
Hierarchical Model
• Estimates parameters
using a compromise
between complete
pooling and no pooling.

Yi ~ N  j[i ]   j[i ]  X i , 
47
Deloitte Analytics Institute
2

    
 j 
where   ~ N   ,  
 

j 
  
  2
, 
 
  

 2 
© 2011 Deloitte LLP
A credible approach
• For illustration, recall the random intercept model:
Yi ~ N ( j[i ]  X i ,  2 )
 j ~ N ( ,  2 )
• This model can contain a large number of parameters {1,…,J,}.
• Regardless of J, it contains 4 hyperparameters {, , , }.
• Here is how the hyperparameters relate to the parameters:
ˆ j  Z j  ( y j  x j )  (1  Z j )  ˆ
where Z j 
nj
2

nj 
 2
Bühlmann credibility is a special case of hierarchical models.
48
Deloitte Analytics Institute
© 2011 Deloitte LLP
Shrinkage Effect of Hierarchical Models
• Illustration: estimating workers
compensation claim frequency
by industry class.
no pool
• Poisson hierarchical model
(aka “credibility model”)
Modeled Claim Frequency by Class
Poisson Models: No Pooling and Simple Credibility
51 19
126
24 68
70
50
4786
53
66 56 115
107
114
121
35
130
42
133
93
58 64
124
88
30
1
63
26
hierach
0.00
grand mean
0.05
0.10
Claim Frequency
49
Deloitte Analytics Institute
© 2011 Deloitte LLP
Validating the fully Bayesian hierarchical
model
Year 7 Validation
Year-7 claims (red dot) and 90% posterior credible interval
Roughly 90% of
the claims from
the validation time
period fall within
the 90% posterior
credible interval.
83
46
82
25
45
105
92
98
38
43
122
41
44
120
99
20
91
6
84
112
119
80
89
29
76
97
5
101
39
74
0
50
Deloitte Analytics Institute
50
100
150
200
250
© 2011 Deloitte LLP
Case Study:
A Fully Bayesian Model
Collaboration with Wayne Zhang and Vanja Dukic
Data
A garden-variety Workers Compensation Schedule P loss triangle:
AY
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
premium
2,609
2,694
2,594
2,609
2,077
1,703
1,438
1,093
1,012
976
chain link
chain ldf
growth curve
12
404
387
421
338
257
193
142
160
131
122
Cumulative
24
36
986 1,342
964 1,336
1,037 1,401
753 1,029
569
754
423
589
361
463
312
408
352
Losses in 1000's
48
60
1,582 1,736
1,580 1,726
1,604 1,729
1,195 1,326
892
958
661
713
533
72
1,833
1,823
1,821
1,395
1,007
84
1,907
1,903
1,878
1,446
96
1,967
1,949
1,919
108
2,006
1,987
120
2,036
CL Ult
2,036
2,017
1,986
1,535
1,110
828
675
601
702
576
2.365 1.354 1.164 1.090 1.054 1.038 1.026 1.020 1.015
1.000
4.720 1.996 1.473 1.266 1.162 1.102 1.062 1.035 1.015
1.000
21.2% 50.1% 67.9% 79.0% 86.1% 90.7% 94.2% 96.6% 98.5% 100.0%
12,067
CL LR
0.78
0.75
0.77
0.59
0.53
0.49
0.47
0.55
0.69
0.59
CL res
0
29
67
89
103
115
142
193
350
454
1,543
• Let’s model this as a longitudinal dataset.
• Grouping dimension: Accident Year (AY)
We can build a parsimonious non-linear model that uses random effects to
allow the model parameters to vary by accident year.
52
Deloitte Analytics Institute
© 2011 Deloitte LLP
Growth curves – at the heart of the model
• We want our model to
reflect the non-linear nature
of loss development.
Weibull and Loglogistic Growth Curves
Heursitic: Fit Curves to Chain Ladder Development Pattern
• GLM shows up a lot in the
stochastic loss reserving
literature…
• Growth curves (Clark 2003)
•  = ultimate loss ratio
•  = scale
•  = shape (“warp”)
• Heuristic idea
Cumulative Percent of Ultimate
• … but are GLMs natural models
for loss triangles?
1.0
x
G( x |  , )  
x  
0.8
0.6

G( x | , )  1  exp  ( x /  ) 
0.4
0.2
• We judgmentally select a
growth curve form
• Let  vary by year (hierarchical)
• Add priors to the
hyperparameters (Bayesian)
53
Deloitte Analytics Institute

Loglogistic
Weibull
0.0
12
24
36
48
60
72
84
96
108
120
Development Age
132
144
156
168
180
© 2011 Deloitte LLP
An exploratory non-Bayesian hierarchical model
 t 
   i (t j )
yi (t j )   i * pi *  
 
t




2
 i ~ N  , 
 i (t j )   i (t j 1 )   i (t j )
It is easy to fit non-Bayesian hierarchical
models as a data exploration step.


Log-Loglistic Hierarchical Model (non-Bayesian)
2500
1988
1989
1990
1991
1992
2250
2000
1750
y
y
1250
y
y
y
1500
1000
750
Cumulative Loss
500
250
Premium
260.9
Ult LR ==0.82
2500
Premium
269.4
Ult LR ==0.79
1993
1994
1995
1996
1997
t
t
t
t
t
2250
2000
1750
y
y
y
y
1250
y
1500
1000
750
500
250
Premium
170.3
Ult LR ==0.51
12
54
36
60
84
108
Deloitte tAnalytics Institute
12
36
60
84
t
108
12
36
60
84
t
Development Time
108
12
36
Premium
Ult LR = 143.8
0.5
60
84
t
108
12
36
60
84
108
© 2011
Deloitte LLP
t
Adding Bayesian structure
• Our hierarchical model is “half-way Bayesian”
• On the one hand, we place probability sub-models on certain parameters
• But on the other hand, various (hyper)parameters are estimated directly from the data.
• To make this fully Bayesian, we need to put probability distributions on all
quantities that are uncertain.
• We then employ Bayesian updating: the model (“likelihood function”) together with
the prior results in a posterior probability distribution over all uncertain quantities.
• Including ultimate loss ratio parameters and hyperparameters!
 We are directly modeling the ultimate quantity of interest.
• This is not as hard as it sounds:
• We do not explicitly calculate the high-dimensional posterior probability distribution.
• We do use Markov Chain Monte Carlo [MCMC] simulation to sample from the posterior.
• Technology: JAGS (“Just Another Gibbs Sampler”), called from within R.
55
Deloitte Analytics Institute
© 2011 Deloitte LLP
Example
(with Wayne Zhang and Vanja Dukic)
• Posterior credible intervals of incremental losses – by accident year
• Based on non-linear hierarchical growth curve model
56
Deloitte Analytics Institute
© 2011 Deloitte LLP
Example
(with Wayne Zhang and Vanja Dukic)
• Posterior credible intervals of incremental losses – by accident year
• Based on non-linear hierarchical growth curve model
57
Deloitte Analytics Institute
© 2011 Deloitte LLP
Posterior distribution of aggregate outstanding losses
• Non-informative priors were used
• Different priors tested as a sensitivity
analysis
Outstanding Loss Estimates at Different Evaluation Points
Estimated Ultimate Losses Minus Losses to Date
At 120 Months
chain ladder estimate
• A full posterior distribution falls
out of the analysis
• No need for boostrapping, ad hoc
simulations, settling for a point
estimate with a confidence interval
500
1000
1500
2000
2500
3000
3500
4000
3000
3500
4000
3000
3500
4000
At 180 Months
• Use of non-linear (growth curve)
model enables us to project
beyond the range of the data
• Choice of growth curves affects the
estimates more than the choice of
priors!
500
1000
1500
2000
2500
At Ultimate
• This choice “does the work of” a
choice of tail factors
58
Deloitte Analytics Institute
500
1000
1500
2000
2500
© 2011 Deloitte LLP
Why Bayes
• “A coherent integration of evidence from different sources”
• Background information
• Expert knowledge / judgment (“subjectivity” is a feature, not a bug)
• Other datasets (e.g. multiple triangles)
• Shrinkage, “borrowing strength”, hierarchical model structure – all coin of the realm
• Rich output: full probability distribution estimates of all quantities of interest
• Ultimate loss ratios by accident year
• Outstanding loss amounts
• Missing values of any cell in a loss triangle
• Model the process that generates the data
• As opposed to modeling the data with “procedural” methods
• We can fit models as complex (or simple) as the situation demands
• Nonlinear growth patterns, trends, autoregressive, hierarchical, structure, …
• Conceptual clarity
• Single-case probabilities make sense in the Bayesian framework
• Communication of risk: “mean what you say and say what you mean”
59
Deloitte Analytics Institute
© 2011 Deloitte LLP
A Parting Thought
Parting thought: our field’s Bayesian heritage
“Practically all methods of statistical estimation… are based on… the
assumption that any and all collateral information or a priori knowledge is
worthless. It appears to be only in the actuarial field that there has been an
organized revolt against discarding all prior knowledge when an estimate is to
be made using newly acquired data.”
-- Arthur Bailey (1950)
61
Deloitte Analytics Institute
© 2011 Deloitte LLP
Parting thought: our field’s Bayesian heritage
“Practically all methods of statistical estimation… are based on… the
assumption that any and all collateral information or a priori knowledge is
worthless. It appears to be only in the actuarial field that there has been an
organized revolt against discarding all prior knowledge when an estimate is to
be made using newly acquired data.”
-- Arthur Bailey (1950)
... And today, in the age of MCMC, cheap
computing, and open-source software...
62
Deloitte Analytics Institute
© 2011 Deloitte LLP
Parting thought: our field’s Bayesian heritage
“Practically all methods of statistical estimation… are based on… the
assumption that any and all collateral information or a priori knowledge is
worthless. It appears to be only in the actuarial field that there has been an
organized revolt against discarding all prior knowledge when an estimate is to
be made using newly acquired data.”
-- Arthur Bailey (1950)
... And today, in the age of MCMC, cheap
computing, and open-source software...
“Scientific disciplines from astronomy to zoology are moving to Bayesian data
analysis. We should be leaders of the move, not followers.”
-- John Kruschke, Indiana University Psychology (2010)
63
Deloitte Analytics Institute
© 2011 Deloitte LLP
Appendix:
Some MCMC Intuition
Metropolis-Hastings Intuition
• Let’s take a step back and remember why we’ve done all of this.
• In ordinary Monte Carlo integration, we take a large number of independent
draws from the probability distribution of interest and let the sample average of
{g(i)} approximate the expected value E[g()].
1
N
N
 g (
i 1
(i )
)

N 
 g ( ) ( )d  Eg ( )
• The Strong Law of Large Numbers justifies this approximation.
• But: when estimating Bayesian posteriors, we are generally not able to take
independent draws from the distribution of interest.
• Results from the theory of stochastic processes tell us that suitably well-behaved
Markov Chains can also be used to perform Monte Carlo integration.
65
Deloitte Analytics Institute
© 2011 Deloitte LLP
Some Facts from Markov Chain Theory
How do we know this algorithm yields reasonable approximations?
• Suppose our Markov chain 1, 2, … with transition matrix P satisfies some
“reasonable conditions”:
•
Aperiodic, irreducible, positive recurrent (see next slide)
•
Chains generated by the M-H algorithm satisfy these conditions
• Fact #1 (convergence theorem): P has a unique stationary (“equilibrium”)
distribution, . (i.e. =P). Furthermore, the chain converges to .
•
Implication: We can start anywhere in the sample space so long as we through out a sufficiently
long “burn-in”.
• Fact #2 (Ergodic Theorem): suppose g() is some function of . Then:
1
N
66
N
 g (
i 1
(i )
)

N 
 g ( ) ( )d  Eg ( )
•
Implication: After a sufficient burn-in, perform Monte Carlo integration by averaging over a suitably
well-behaved Markov chain.
•
The values of the chain are not independent, as required by the SLLN.
•
But the Ergodic Theorem says we’re close enough to independence to get what we need.
© 2011 Deloitte LLP
Deloitte Analytics Institute
Conditions for Ergodicity
More on those “reasonable conditions” on Markov chains:
• Aperiodic: The chain does not regularly return to any value  in the state
space in multiples of some k>1.
• Irreducible: It is possible to go from any state i to any other state j in some
finite number of steps.
• Positive recurrent: The chain will return to any particular state  with
probability 1, and expected return time finite.
• Intuition:
•
The Ergodic Theorem tells us that (in the limit) the amount of time the chain spends in a particular
region of state space equals the probability assigned to that region.
•
This won’t be true if (for example) the chain gets trapped in a loop, or won’t visit certain parts of the
space in finite time.
• The practical problem: use the Markov chain to select a representative sample
from the distribution , expending a minimum amount of computer time.
67
Deloitte Analytics Institute
© 2011 Deloitte LLP
Download