The Future of Loss Reserving A Bayesian 21st Century R for Insurance Conference Amsterdam June 28, 2015 Jim Guszcza, PhD, FCAS Deloitte Consulting LLP jguszcza@deloitte.com 2 Deloitte Analytics Institute © 2011 Deloitte LLP Motivation: Why Bayes, Why Now Probably what we want “Given any value (estimate of future payments) and our current state of knowledge, what is the probability that the final payments will be no larger than the given value?” -- Casualty Actuarial Society (2004) Working Party on Quantifying Variability in Reserve Estimates I read this as a request for a Bayesian predictive distribution. 4 Deloitte Analytics Institute © 2011 Deloitte LLP Bayes gives us what we want “Modern Bayesian methods provide richer information, with greater flexibility and broader applicability than 20th century methods. Bayesian methods are intellectually coherent and intuitive. Bayesian analyses are readily computed with modern software and hardware.” -- John Kruschke, Indiana University Psychology 5 Deloitte Analytics Institute © 2011 Deloitte LLP Why Bayes • “A coherent integration of evidence from different sources” • Background information • Expert knowledge / judgment (“subjectivity” is a feature, not a bug) • Other datasets (e.g. multiple triangles) • Shrinkage, “borrowing strength”, hierarchical model structure – all coin of the realm 6 Deloitte Analytics Institute © 2011 Deloitte LLP Why Bayes • “A coherent integration of evidence from different sources” • Background information • Expert knowledge / judgment (“subjectivity” is a feature, not a bug) • Other datasets (e.g. multiple triangles) • Shrinkage, “borrowing strength”, hierarchical model structure – all coin of the realm • Rich output: full probability distribution estimates of all quantities of interest • Ultimate loss ratios by accident year • Outstanding loss amounts • Missing values of any cell in a loss triangle 7 Deloitte Analytics Institute © 2011 Deloitte LLP Why Bayes • “A coherent integration of evidence from different sources” • Background information • Expert knowledge / judgment (“subjectivity” is a feature, not a bug) • Other datasets (e.g. multiple triangles) • Shrinkage, “borrowing strength”, hierarchical model structure – all coin of the realm • Rich output: full probability distribution estimates of all quantities of interest • Ultimate loss ratios by accident year • Outstanding loss amounts • Missing values of any cell in a loss triangle • Model the process that generates the data • As opposed to modeling the data with “procedural” methods • We can fit models as complex (or simple) as the situation demands • Nonlinear growth patterns, trends, autoregressive, hierarchical, structure, … 8 Deloitte Analytics Institute © 2011 Deloitte LLP Why Bayes • “A coherent integration of evidence from different sources” • Background information • Expert knowledge / judgment (“subjectivity” is a feature, not a bug) • Other datasets (e.g. multiple triangles) • Shrinkage, “borrowing strength”, hierarchical model structure – all coin of the realm • Rich output: full probability distribution estimates of all quantities of interest • Ultimate loss ratios by accident year • Outstanding loss amounts • Missing values of any cell in a loss triangle • Model the process that generates the data • As opposed to modeling the data with “procedural” methods • We can fit models as complex (or simple) as the situation demands • Nonlinear growth patterns, trends, autoregressive, hierarchical, structure, … • Conceptual clarity • Single-case probabilities make sense in the Bayesian framework • Communication of risk: “mean what you say and say what you mean” 9 Deloitte Analytics Institute © 2011 Deloitte LLP Today’s Bayes Is our industry living up to its rich Bayesian heritage? 10 Deloitte Analytics Institute © 2011 Deloitte LLP Bayesian Principles The Fundamental Bayesian Principle “For Bayesians as much as for any other statistician, parameters are (typically) fixed but unknown. It is the knowledge about these unknowns that Bayesians model as random… … typically it is the Bayesian who makes the claim for inference in a particular instance and the frequentist who restricts claims to infinite populations of replications.” -- Andrew Gelman and Christian Robert 12 Deloitte Analytics Institute © 2011 Deloitte LLP The Fundamental Bayesian Principle “For Bayesians as much as for any other statistician, parameters are (typically) fixed but unknown. It is the knowledge about these unknowns that Bayesians model as random… … typically it is the Bayesian who makes the claim for inference in a particular instance and the frequentist who restricts claims to infinite populations of replications.” -- Andrew Gelman and Christian Robert Translation: • Frequentist: Probability models the infinite replications of the data X • Bayesian: 13 Deloitte Analytics Institute Probability models our partial knowledge about 𝜃 © 2011 Deloitte LLP Updating Subjective Probability • Bayes’ Theorem (a mathematical fact): Pr( H E ) Pr( E | H ) Pr( H ) Pr( H | E ) Pr( E ) Pr( E ) • Bayes’ updating rule (a methodological premise): • Let P(H) represents our belief in hypothesis H before receiving evidence E. • Let P*(H) represent our belief about H after receiving evidence E. • Bayes Rule: P*(H) = Pr(H|E) Pr( H ) Pr( H | E ) E 14 Deloitte Analytics Institute © 2011 Deloitte LLP Learning from data Suppose Persi tosses a coin 12 times and gets 3 heads. What is the probability of heads on the 13th toss? 15 Deloitte Analytics Institute © 2011 Deloitte LLP Learning from data Suppose Persi tosses a coin 12 times and gets 3 heads. What is the probability of heads on the 13th toss? Frequentist analysis 𝑋𝑖 ~𝑖𝑖𝑑 𝐵𝑒𝑟𝑛 𝜃 𝐿 𝜃|𝐻 = 3, 𝑛 = 12 = 16 Deloitte Analytics Institute 𝜃3 1 − 𝜃 9 𝜃𝑀𝐿𝐸 = 14 © 2011 Deloitte LLP Learning from data Suppose Persi tosses a coin 12 times and gets 3 heads. What is the probability of heads on the 13th toss? Frequentist analysis 𝑋𝑖 ~𝑖𝑖𝑑 𝐵𝑒𝑟𝑛 𝜃 𝐿 𝜃|𝐻 = 3, 𝑛 = 12 = 𝜃3 1 − 𝜃 9 𝜃𝑀𝐿𝐸 = 14 Thoughts • “Parameter risk”: 12 flips is not a lot of data (“credibility concerns”) • We’ve flipped other coins before… isn’t that knowledge relevant? • It would be nice to somehow “temper” the estimate of ¼ or “credibility weight” it with some other source of information • It would be nice not to just give a point estimate and a confidence interval, but say things like: 𝑃𝑟 𝐿 < 𝜃 < 𝑈 = 𝑝 17 Deloitte Analytics Institute © 2011 Deloitte LLP Learning from data Suppose Persi tosses a coin 12 times and gets 3 heads. What is the probability of heads on the 13th toss? Bayesian analysis 𝜃~𝐵𝑒𝑡𝑎 𝛼, 𝛽 𝜃~𝐵𝑒𝑡𝑎 𝛼 + 3, 𝛽 + 9 18 Deloitte Analytics Institute © 2011 Deloitte LLP Learning from data Suppose Persi tosses a coin 12 times and gets 3 heads. What is the probability of heads on the 13th toss? Bayesian analysis 𝜃~𝐵𝑒𝑡𝑎 𝛼, 𝛽 𝜃~𝐵𝑒𝑡𝑎 𝛼 + 3, 𝛽 + 9 Thoughts • “Parameter risk”: quantified by the posterior distribution • Prior knowledge: encoded in the choice of {𝛼, 𝛽} • Other data: maybe Persi has flipped other coins on other days… we could throw all of this (together with our current data) into a hierarchical model • Mean what we say and say what we mean: 𝑃𝑟 𝐿 < 𝜃 < 𝑈 = 𝑝 is a “credibility interval”… it’s what most people think confidence intervals say… (but don’t!) 19 Deloitte Analytics Institute © 2011 Deloitte LLP Prior distributions: a feature, not a bug “Your ‘subjective’ probability is not something fetched out of the sky on a whim; it is what your actual judgment should be, in view of your information to date and other people’s information.” -- Richard Jeffrey, Princeton University 20 Deloitte Analytics Institute © 2011 Deloitte LLP Prior distributions: a feature, not a bug “Your ‘subjective’ probability is not something fetched out of the sky on a whim; it is what your actual judgment should be, in view of your information to date and other people’s information.” -- Richard Jeffrey, Princeton University • “Subjective” probability is really “judgmental” probability • The choice of likelihood function is also “subjective” In this sense • ODP (or other) distributional form • Inclusion of covariates • Trends • Tail factor extrapolations • …. 21 Deloitte Analytics Institute © 2011 Deloitte LLP Bayesian Computation An intractable problem Before 1990: this sort of thing was often viewed as a parlor trick because of the need to analytically solve high-dimensional integrals: f (Y | X ) f (Y | ) f ( | X )d f ( X | ) ( ) d f (Y | ) f ( X | ) ( )d f ( | X ) 23 Deloitte Analytics Institute f ( X | ) ( ) f ( X | ) ( )d © 2011 Deloitte LLP Why Everyone Wasn’t a Bayesian Before 1990: this sort of thing was often viewed as a parlor trick because of the need to analytically solve high-dimensional integrals: f (Y | X ) f (Y | ) f ( | X )d 24 Deloitte Analytics Institute f ( X | ) ( ) d f (Y | ) f ( X | ) ( )d © 2011 Deloitte LLP MCMC makes it practical After 1990: The introduction of Markov Chain Monte Carlo [MCMC] simulation to Bayesian practice introduces a “new world order”: Now we can simulate Bayesian posteriors. 25 Deloitte Analytics Institute © 2011 Deloitte LLP Chains we can believe in We set up random walks through parameter space that… in the limit… pass through each region in the probability space in proportion to the posterior probability density of that region. • How the Metropolis-Hastings sampler generates a Markov chain {1, 2, 3,… }: 1. Time t=1: select a random initial position 1 in parameter space. 2. Select a proposal distribution p() that we will use to select proposed random steps away from our current position in parameter space. 3. Starting at time t=2: repeat the following until you get convergence: a) b) c) At step t, generate a proposed *~p() Also generate u ~ unif(0,1) If R > u then t= *. Else, t= t-1. f ( * | X ) p ( t 1 | * ) R f ( t 1 | X ) p ( * | t 1 ) Step (3c) implies that at step t, we accept the proposed step * with probability min(1,R). 26 Deloitte Analytics Institute © 2011 Deloitte LLP Let’s go to the Metropolis • So now we have something we can easily program into a computer. • At each step, give yourself a coin with probability of heads min(1,R) and flip it. f ( X | * ) ( * ) p ( t 1 | * ) R f ( X | t 1 ) ( t 1 ) p ( * | t 1 ) • If the coin lands heads move from t-1 to * • Otherwise, stay put. • The result is a Markov chain (step t depends only on step t-1… not on prior steps). And it converges on the posterior distribution. 27 Deloitte Analytics Institute © 2011 Deloitte LLP Random walks with 4 different starting points • We estimate the lognormal density using 4 separate sets of starting values. 10 8 2 0 5 10 15 0 mu sigma 6 4 sigma 6 sigma 6 4 0 5 10 15 10 15 mu 10 ln( x) 8 , z 2 5 2 3 4 5 2 5 2 4 6 2 24 3 4 2 8 x 2 exp z 0 1 sigma f ( x | , ) 5 10 • Data: 50 random draws from lognormal(9,2). sigma 8 2 34 4 10 First 5 Metropolis-Hastings Steps 0 0 3 2 0 28 Deloitte Analytics Institute 5 10 mu 15 mu 0 5 mu © 2011 Deloitte LLP Random walks with 4 different starting points • After 10 iterations, the lower right chain is already in the right neighborhood. 10 8 4 6 8 7 5 6 9 10 0 0 0 5 10 15 0 5 10 15 10 6 2 3 5 7 8 6 2 2 7 6 5 4 10 9 0 0 3 2 0 Deloitte Analytics Institute 15 mu 4 sigma 6 4 10 9 8 4 29 10 8 10 mu 8 sigma sigma 24 3 2 4 10 sigma 10 8 7 85 6 9 6 2 34 2 sigma First 10 Metropolis-Hastings Steps 5 10 mu 15 mu 0 5 mu © 2011 Deloitte LLP Random walks with 4 different starting points 10 8 6 4 8 7 5 6 9 1110 12 13 14 15 16 17 18 19 20 0 0 0 5 10 15 0 5 10 10 mu 6 2 3 4 sigma 6 4 4 5 7 8 6 20 2 7 6 5 13 16 18 17 12 11 10 919 8 15 14 2 4 12 11 10 18 17 16 15 14 13 920 19 0 0 3 2 0 30 Deloitte Analytics Institute 15 8 10 mu 8 sigma sigma 24 3 2 4 10 11 12 13 14 16 15 17 18 19 20 sigma 10 7 85 6 9 6 2 34 8 First 20 Metropolis-Hastings Steps 2 sigma • After 20 iterations, only the 3rd chain is still in the wrong neighborhood. 5 10 mu 15 mu 0 5 10 mu 15 © 2011 Deloitte LLP Random walks with 4 different starting points 10 10 8 6 0 5 10 15 10 mu 6 4 2 3 4 5 7 8 6 49 50 25 24 12 11 10 18 17 16 15 14 13 9 23 22 148 21 20 9 39 42 41 40 38 37 36 31 30 29 35 34 33 32 47 46 45 44 43 28 27 26 0 0 2 2324 25 2930 31 32 2826 27 22 34 33 20 21 42 41 40 39 38 37 36 35 47 46 45 44 43 49 48 50 2 7 6 5 13 16 18 17 12 11 10 919 8 15 14 sigma 8 10 6 4 sigma 25 2324 2930 32 2831 27 26 3433 22 20 3635 21 49 50 42 41 40 39 3843 37 47 46 45 44 48 4 15 mu 3 2 0 Deloitte Analytics Institute 8 7 5 6 9 1110 12 13 14 15 16 17 18 19 0 0 5 8 sigma 0 4 31 24 3 2 4 10 11 12 13 14 16 15 1743 44 49 50 31 30 18 129 20 932 28 27 26 25 24 47 46 45 48 23 22 21 42 41 40 39 3833 37 36 35 34 sigma 10 7 85 6 9 6 2 34 8 First 50 Metropolis-Hastings Steps 2 sigma • After 50 iterations, all 4 chains have arrived in the right neighborhood. 5 10 mu 15 mu 0 5 10 mu 15 © 2011 Deloitte LLP Random walks with 4 different starting points • By 500 chains, it appears that the burn-in has long since been accomplished. 5 10 10 8 6 25 2324 2930 32 2831 27 26 3433 22 20 36 35 315 314 313 312 21 393 394 388 392 391 390 389 310 309 308 307 311 301 300 49 52 51 123 122 180 88 87 86 85 84 451 450 449 442 60 141 140 139 138 137 231 230 229 228 227 364 59 441 440 472 302 462 461 460 459 458 457 456 473 474 463 439 120 119 395 396 401 400 399 407 406 405 404 62 420 419 418 417 416 415 398 397 409 408 125 248 247 421 70 69 403 68 67 66 65 64 63 4 410 02 267 266 265 479 478 477 218 269 268 136 476 475 270 467 466 465 124 217 183 191 190 112 111 7980 78 185 184 447 117 116 446 445 444 443 293 232 495 332 83 82 81 115 110 249 306 305 75 74 189 188 187 186 448 114 494 109 108 118 113 750 77 6 412 95 147 303 151 150 143 142 469 468 304 411 146 145 144 452 455 454 453 149 94 1 48 225 224 223 358 357 356 257 256 258 363 362 361 360 359 365 204 203 202 201 200 107 242 237 236 235 106 105 134 133 243 216 215 214 213 212 211 2 241 240 239 238 73 209 208 207 206 72 71 210 05 167 166 165 263 262 261 260 259 42 41 40 39 38 37 245 430 429 428 427 244 246 437 436 435 434 433 432 431 47 46 45 44 43 282 281 48 438 264 102 101 100 178 177 176 175 174 99 98 97 96 104 103 162 161 160 179 355 354 413 255 254 253 252 251 173 414 163 299 298 297 296 295 345 294 371 370 377 376 380 379 367 366 369 368 375 374 373 372 498 497 496 3 352 351 350 349 348 347 500 499 378 353 46 326 132 471 156 155 154 153 322 321 320 319 318 317 324 152 323 331 330 329 328 327 336 335 334 333 387 470 131 130 129 128 127 126 325 164 93 92 91 90 89 168 199 195 194 193 192 158 157 485 484 483 482 481 480 234 233 491 490 489 488 56 159 197 170 196 198 493 492 172 171 487 486 339 344 343 342 341 340 61 383 382 381 287 286 285 284 283 316 292 423 422 135 291 290 289 288 338 337 57 55 250 226 58 54 53 121 273 272 271 464 182 181 386 385 384 280 279 278 277 276 275 274 426 425169 424 222 221 220 219 4 15 0 5 10 15 6 8 10 mu 4 2 3 4 5 7 8 6 315 314 313 312 394 393 388 392 391 390 389 310 309 308 307 311 301 300 123 122 180 451 450 449 49 52 51 50 442 141 140 139 138 137 231 230 229 228 227 364 441 440 472 302 462 461 460 459 458 457 456 473 60 474 463 88 87 86 85 84 439 120 119 59 8 92 91 90 96 395 396 400 399 401 407 406 405 404 398 397 409 408 421 420 419 418 417 416 415 25 24 125 248 247 403 410 4 02 267 266 265 479 478 477 218 269 268 69 68 67 66 65 64 63 62 136 476 475 270 467 466 465 124 217 70 183 191 190 112 111 185 184 447 117 116 446 445 444 443 293 232 495 332 110 115 249 306 305 189 188 187 186 448 114 494 109 108 118 113 95 94 147 303 7980 78 151 150 143 142 469 468 83 82 81 304 412 411 146 145 144 452 75 74 455 454 453 149 1 93 7 77 48 225 224 223 358 357 356 257 256 258 363 362 361 360 359 365 204 203 202 201 200 107 242 237 236 235 106 105 134 133 243 216 215 214 213 212 211 2 241 240 239 238 209 208 207 206 210 05 263 262 261 260 259 427 12 11 10 245 430 429 428 18 17 16 15 14 13 244 246 437 436 435 434 433 432 431 9 23 22 282 281 1 21 20 73 72 71 9 438 167 166 165 264 102 101 100 178 177 176 175 42 41 40 39 99 98 97 96 38 37 36 104 103 162 161 160 179 355 354 31 30 29 255 254 253 252 251 413 35 34 33 32 47 46 45 44 43 163 48 414 299 298 297 296 295 345 174 294 371 370 377 376 380 379 367 366 369 368 375 374 373 372 498 497 496 3 352 351 350 349 348 347 500 499 173 378 353 46 326 132 471 156 155 154 153 322 321 320 319 318 317 324 152 323 331 330 329 328 327 336 335 334 333 131 130 129 128 387 470 127 126 325 164 168 199 195 194 193 192 158 157 485 484 483 482 481 480 234 233 491 490 489 488 159 197 170 196 198 493 492 172 171 487 486 169 339 344 343 342 341 340 423 422 383 382 381 287 286 285 284 283 316 292 135 291 290 289 288 338 337 250 226 121 273 272 271 61 464 28 27 26 182 181 57 56 58 426 386 385 384 280 279 278 277 276 275 274 55 54 53 425 424 222 221 220 219 0 0 2 2324 25 2930 31 32 2826 27 22 34 33 20 442 388 389 462 461 317 463 451 450 449 316 315 314 313 441 440 439 456 312 420 231 230 229 228 227 419 418 417 416 415 88 87 86 85 84 421 364 120 119 8 92 91 90 96 472 473 151 150 474 149 21 296 183 447 446 445 444 443 311 310 309 185 184 180 248 247 448 308 307 56 181 270 269 268 267 266 265 112 111 42 218 479 478 477 69 117 116 68 67 66 65 64 63 62 61 60 476 475 41 40 39 115 460 459 38 37 36 35 110 217 249 452 454 453 57 55 114 47 46 45 44 43 109 108 118 59 58 70 113 49 48 54 53 52 51 455 95 94 332 400 399 398 397 396 495 191 409 408 407 406 405 404 403 402 401 190 189 232 494 410 350 349 348 93 433 432 7980 78 437 436 435 434 83 82 81 75 74 351 438 750 77 261262 260 259 107 225 224 223 106 105 430 429 428 427 257 256 258 431 263 156 155 154 153 242 214 213 212 211 237 236 235 205 204 203 202 201 200 152 358 209 208 207 206 243 210 413 412 411 216 215 241 240 239 238 357 356 363 362 361 360 359 365 329 414 264 299 298 297 286 295 283 302 301 300 157 294 486 245 287 488 282 281 293 292 291 244 246 285 284 290 289 288 491 487 73 72 71 490 489 467 102 101 100 147 99 98 97 96 104 103 143 142 141 140 139 138 255 254 253 137 252 251 322 321 320 319 318 324 355 354 146 145 144 323 466 465 464 1 325 48 395 346 345 377 376 375 374 373 372 371 370 380 379 367 366 369 423 422 368 394 393 347 352 378 353 167 166 165 469 468 303 471 470 331 330 336 387 498 497 496 136 135 134 133 335 334 333 306 500 499 305 304 123 122 234 485 484 483 482 481 480 197 273 196 195 194 193 192 199 198 162 161 160 188 187 186 233 174 173 426 179 178 177 176 175 458 457 272 271 493 492 182 121 163 339 338 337 344 343 274 342 341 340 425 424 250 226 164 158 383 382 381 159 170 169 168 171 280 279 278 275 386 385 384 172 132 131 130 129 128 127 126 125 277 276 392 391 390 124 327 326 328 222 221 220 219 2 7 6 5 13 16 18 17 12 11 10 919 8 15 14 sigma 4 6 8 10 mu 3 2 0 Deloitte Analytics Institute 8 7 5 6 9 1110 12 13 14 15 16 17 18 19 0 0 0 4 32 24 3 2 4 10 11 12 13 14 16 15 17 388 44 43 389 317 316 315 314 313 312 311 310 309 49 52 51 50 31 30 29 32 364 442 60 231 230 229 228 227 88 87 86 85 84 451 450 449 120 119 59 8265 92 91 90 96 18 441 440 439 151 150 332 270 269 268 267 266 149 183 472 69 1 20 68 67 66 65 64 63 62 185 184 473 180 474 248 247 462 461 460 459 458 457 456 308 307 9 181 70 463 112 111 218 117 116 400 399 398 397 396 479 478 477 409 408 407 406 405 404 403 402 401 420 419 418 417 416 415 115 476 475 110 217 249 114 421 109 108 118 410 113 467 95 94 191 190 189 232 28 27 26 25 24 466 465 93 7980 78 83 82 81 412 411 448 447 446 445 444 443 75 74 363 362 361 365 748 77 261262 260 259 107 469 468 225 224 223 237 236 235 106 105 354 257 256 452 238 258 455 454 453 263 345 156 155 154 153 242 214 213 212 211 205 204 203 202 201 200 152 209 208 207 206 243 210 216 215 241 240 239 3 347 73 352 351 350 349 348 72 71 353 46 264 299 298 297 296 286 295 102 101 100 326 147 157 99 98 97 96 283 294 302 301 300 336 437 486 104 103 245 322 321 320 319 318 143 142 141 140 139 138 287 324 491 490 489 488 495 244 246 282 281 293 292 255 254 253 285 284 291 290 289 288 323 433 432 137 252 251 331 330 329 328 327 335 334 333 498 497 496 146 145 144 325 436 435 434 494 493 492 500 499 47 46 45 487 1 438 48 367 374 373 372 371 370 369 395 380 379 378 377 376 368 366 431 430 429 428 427 375 394 393 344 343 342 339 338 337 341 340 387 23 22 413 21 414 167 166 165 306 471 470 305 304 303 136 135 134 133 123 122 234 197 273 485 484 483 482 481 480 196 195 194 193 192 199 198 383 382 381 162 161 160 188 187 186 233 174 173 179 178 177 176 175 272 271 182 121 163 274 61 358 357 356 423 422 250 226 57 56 360 359 58 392 391 164 390 170 169 168 158 171 386 385 384 172 159 55 54 53 280 279 278 277 276 275 355 426 132 131 130 129 128 127 126 125 464 42 41 40 39 38 37 36 425 424 124 35 34 33 222 221 220 219 sigma 10 8 7 85 6 9 6 2 34 2 sigma sigma The time the chain spends in a neighborhood approximates the posterior probability that (,) lies in this neighborhood. sigma • The chain continues to wander. First 500 Metropolis-Hastings Steps 5 10 mu 15 mu 0 5 10 mu 15 © 2011 Deloitte LLP In 3D Recall the true lognormal parameters are: =9 and =2 33 Deloitte Analytics Institute © 2011 Deloitte LLP Metropolis-Hastings results Metropolis-Hastings Simulation of Lognormal(9,2) 8.0 9.0 mh$mu 0.5 0.0 10.0 1.5 mu 1.0 • The MH simulation is gives consistent results: 8.0 8.5 9.0 9.5 10.0 0 2000 coda$mu 2.0 10 sigma Deloitte Analytics Institute 8 6 2 4 mh$sigma 1.0 0.5 0.0 1.5 34 10000 Index 1.5 • Only the final 5000 of the 10000 MH iterations were used to estimate , 6000 2.0 2.5 coda$sigma 3.0 0 2000 6000 Index 10000 © 2011 Deloitte LLP Metropolis-Hastings results Metropolis-Hastings Simulation of Lognormal(9,2) 8.0 9.0 mh$mu 0.5 0.0 10.0 1.5 mu 1.0 Note the very rapid convergence despite unrealistic initial values. 8.0 8.5 9.0 9.5 10.0 0 2000 coda$mu 6000 10000 Index 0.0 1.5 35 Deloitte Analytics Institute 6 4 2 0.5 1.0 mh$sigma 8 1.5 2.0 10 sigma 2.0 2.5 coda$sigma 3.0 0 2000 6000 Index 10000 © 2011 Deloitte LLP An easier way to get the same result Call JAGS from within R Density of mu 0.0 8.5 0.4 9.0 0.8 9.5 1.2 10.0 Trace of mu 8000 10000 8.0 8.5 9.0 9.5 10.0 Iterations N = 500 Bandwidth = 0.06648 Trace of tau Density of tau 0 0.10 2 0.20 4 0.30 6 0.40 8 6000 6000 36 Deloitte Analytics Institute 8000 Iterations 10000 0.1 0.2 0.3 0.4 N = 500 Bandwidth = 0.01229 © 2011 Deloitte LLP Bayesian Loss Reserving Methodology: sophisticated simplicity “It is fruitful to start simply and complicate if necessary. That is, it is recommended that an initial, sophisticatedly simple model be formulated and tested in terms of explaining past data and in forecasting or predicting new data. If the model is successful… it can be put into use. If not, [it] can be modified or elaborated to improve performance…” -- Arnold Zellner, The University of Chicago 38 Deloitte Analytics Institute © 2011 Deloitte LLP Methodology: sophisticated simplicity “It is fruitful to start simply and complicate if necessary. That is, it is recommended that an initial, sophisticatedly simple model be formulated and tested in terms of explaining past data and in forecasting or predicting new data. If the model is successful… it can be put into use. If not, [it] can be modified or elaborated to improve performance…” -- Arnold Zellner, The University of Chicago This is precisely what Bayesian Data Analysis enables us to do! 39 Deloitte Analytics Institute © 2011 Deloitte LLP Methodology: sophisticated simplicity “It is fruitful to start simply and complicate if necessary. That is, it is recommended that an initial, sophisticatedly simple model be formulated and tested in terms of explaining past data and in forecasting or predicting new data. If the model is successful… it can be put into use. If not, [it] can be modified or elaborated to improve performance…” -- Arnold Zellner, The University of Chicago Start with a simple model and then add structure to account for: • Other distributional forms (what’s so sacred about GLM or exponential family??) • Negative incremental incurred losses • Nonlinear structure (e.g. growth curves) • Hierarchical structure (e.g. fitting multiple lines, companies, regions) • Prior knowledge • Other loss triangles (“complement of credibility”) • Calendar/accident year trends • Autocorrelation • … 40 Deloitte Analytics Institute © 2011 Deloitte LLP Background: hierarchical modeling from A to B • Hierarchical modeling is used when one’s data is grouped in some important way. • Claim experience by state or territory • Workers Comp claim experience by class code • Claim severity by injury type • Churn rate by agency • Multiple years of loss experience by policyholder. • Multiple observations of a cohort of claims over time • Often grouped data is modeled either by: • Building separate models by group • Pooling the data and introducing dummy variables to reflect the groups • Hierarchical modeling offers a “middle way”. • Parameters reflecting group membership enter one’s model through appropriately specified probability sub-models. 41 Deloitte Analytics Institute © 2011 Deloitte LLP Common hierarchical models Yi X i i • Classical linear model • Equivalently: Yi ~ N(+Xi, 2) • Same , for each data point • Random intercept model • Where: Yi ~ N(j[i]+Xi, 2) • And: j ~ N(, 2) Yi j[i ] X i i • Same for each data point; but varies by group j • Random intercept and slope model • Both and vary by group Yi ~ N j[i ] j[i ] X i , 42 Deloitte Analytics Institute 2 Yi j[i ] j[i ] X i i j where ~ N , j 2 , 2 © 2011 Deloitte LLP Simple example: policies in force by region • Simple example: Change in PIF by region from 2007-10 PIF Growth by Region region1 • 32 data points region2 region3 region4 2600 • 4 years • But we could as easily have 80 or 800 regions • Our model would not change pif pif pif 2400 pif • 8 regions 2200 2000 region5 region6 region7 region8 year year year year 2600 pif pif pif 2400 pif • We view the dataset as a bundle of very short time series 2200 2000 2007 43 Deloitte Analytics Institute 2008 2009 year 2010 2007 year 2008 2009 year 2010 year © 2011 Deloitte LLP Classical linear model Yi ~ N ( X i , 2 ) • Option 1: the classical linear model PIF Growth by Region region1 pif pif • Just throw all of the data into one pot and regress region4 2400 pif • Don’t reflect region in the model design region3 2600 pif • Complete Pooling region2 2200 • Same and for each region. 2000 region5 region6 region7 region8 year year year year 2600 • This obviously doesn’t cut it. pif pif pif 2400 pif • But filling 8 separate regression models or throwing in regionspecific dummy variables isn’t an attractive option either. 2200 • Danger of over-fitting • i.e. “credibility issues” 2000 2007 44 Deloitte Analytics Institute 2008 2009 year 2010 2007 year 2008 2009 year 2010 year © 2011 Deloitte LLP Randomly varying intercepts Yi ~ N ( j[i ] X i , 2 ) • Option 2: random intercept model PIF Growth by Region 2400 2200 2000 region5 region6 region7 region8 year year year year 2600 pif pif pif 2400 pif • A major improvement region4 pif • This model has 9 parameters: {1, 2, …, 8, } region3 pif 2600 pif • Yi = j[i] + Xi + i region2 pif region1 • And it contains 4 hyperparameters: {, , , } j ~ N ( , 2 ) 2200 2000 2007 45 Deloitte Analytics Institute 2008 2009 year 2010 2007 year 2008 2009 year 2010 year © 2011 Deloitte LLP Randomly varying intercepts and slopes Yi ~ N j[i ] j[i ] X i , • Option 3: the random slope and intercept model j where ~ N , j 2 , 2 PIF Growth by Region 2400 region4 pif • This model has 16 parameters: {1, 2, …, 8, 1, 2,…, 8} region3 pif 2600 pif • Yi = j[i] + j[i] Xi + i region2 pif region1 • (note that 8 separate models also contain 16 parameters) 2 2200 2000 region5 region6 region7 region8 year year year year 2600 pif pif pif 2400 pif • And it contains 6 hyperparameters: {, , , , , } 2200 It’d be the same number of hyperparameters if we had 80 or 800 regions 46 Deloitte Analytics Institute 2000 2007 2008 2009 year 2010 2007 year 2008 2009 year 2010 year © 2011 Deloitte LLP A compromise between complete pooling and no pooling PIF PIF t Complete Pooling k kt k k 1, 2,.., 8 No Pooling • Ignore group structure altogether • Estimate a separate model for each group Compromise Hierarchical Model • Estimates parameters using a compromise between complete pooling and no pooling. Yi ~ N j[i ] j[i ] X i , 47 Deloitte Analytics Institute 2 j where ~ N , j 2 , 2 © 2011 Deloitte LLP A credible approach • For illustration, recall the random intercept model: Yi ~ N ( j[i ] X i , 2 ) j ~ N ( , 2 ) • This model can contain a large number of parameters {1,…,J,}. • Regardless of J, it contains 4 hyperparameters {, , , }. • Here is how the hyperparameters relate to the parameters: ˆ j Z j ( y j x j ) (1 Z j ) ˆ where Z j nj 2 nj 2 Bühlmann credibility is a special case of hierarchical models. 48 Deloitte Analytics Institute © 2011 Deloitte LLP Shrinkage Effect of Hierarchical Models • Illustration: estimating workers compensation claim frequency by industry class. no pool • Poisson hierarchical model (aka “credibility model”) Modeled Claim Frequency by Class Poisson Models: No Pooling and Simple Credibility 51 19 126 24 68 70 50 4786 53 66 56 115 107 114 121 35 130 42 133 93 58 64 124 88 30 1 63 26 hierach 0.00 grand mean 0.05 0.10 Claim Frequency 49 Deloitte Analytics Institute © 2011 Deloitte LLP Validating the fully Bayesian hierarchical model Year 7 Validation Year-7 claims (red dot) and 90% posterior credible interval Roughly 90% of the claims from the validation time period fall within the 90% posterior credible interval. 83 46 82 25 45 105 92 98 38 43 122 41 44 120 99 20 91 6 84 112 119 80 89 29 76 97 5 101 39 74 0 50 Deloitte Analytics Institute 50 100 150 200 250 © 2011 Deloitte LLP Case Study: A Fully Bayesian Model Collaboration with Wayne Zhang and Vanja Dukic Data A garden-variety Workers Compensation Schedule P loss triangle: AY 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 premium 2,609 2,694 2,594 2,609 2,077 1,703 1,438 1,093 1,012 976 chain link chain ldf growth curve 12 404 387 421 338 257 193 142 160 131 122 Cumulative 24 36 986 1,342 964 1,336 1,037 1,401 753 1,029 569 754 423 589 361 463 312 408 352 Losses in 1000's 48 60 1,582 1,736 1,580 1,726 1,604 1,729 1,195 1,326 892 958 661 713 533 72 1,833 1,823 1,821 1,395 1,007 84 1,907 1,903 1,878 1,446 96 1,967 1,949 1,919 108 2,006 1,987 120 2,036 CL Ult 2,036 2,017 1,986 1,535 1,110 828 675 601 702 576 2.365 1.354 1.164 1.090 1.054 1.038 1.026 1.020 1.015 1.000 4.720 1.996 1.473 1.266 1.162 1.102 1.062 1.035 1.015 1.000 21.2% 50.1% 67.9% 79.0% 86.1% 90.7% 94.2% 96.6% 98.5% 100.0% 12,067 CL LR 0.78 0.75 0.77 0.59 0.53 0.49 0.47 0.55 0.69 0.59 CL res 0 29 67 89 103 115 142 193 350 454 1,543 • Let’s model this as a longitudinal dataset. • Grouping dimension: Accident Year (AY) We can build a parsimonious non-linear model that uses random effects to allow the model parameters to vary by accident year. 52 Deloitte Analytics Institute © 2011 Deloitte LLP Growth curves – at the heart of the model • We want our model to reflect the non-linear nature of loss development. Weibull and Loglogistic Growth Curves Heursitic: Fit Curves to Chain Ladder Development Pattern • GLM shows up a lot in the stochastic loss reserving literature… • Growth curves (Clark 2003) • = ultimate loss ratio • = scale • = shape (“warp”) • Heuristic idea Cumulative Percent of Ultimate • … but are GLMs natural models for loss triangles? 1.0 x G( x | , ) x 0.8 0.6 G( x | , ) 1 exp ( x / ) 0.4 0.2 • We judgmentally select a growth curve form • Let vary by year (hierarchical) • Add priors to the hyperparameters (Bayesian) 53 Deloitte Analytics Institute Loglogistic Weibull 0.0 12 24 36 48 60 72 84 96 108 120 Development Age 132 144 156 168 180 © 2011 Deloitte LLP An exploratory non-Bayesian hierarchical model t i (t j ) yi (t j ) i * pi * t 2 i ~ N , i (t j ) i (t j 1 ) i (t j ) It is easy to fit non-Bayesian hierarchical models as a data exploration step. Log-Loglistic Hierarchical Model (non-Bayesian) 2500 1988 1989 1990 1991 1992 2250 2000 1750 y y 1250 y y y 1500 1000 750 Cumulative Loss 500 250 Premium 260.9 Ult LR ==0.82 2500 Premium 269.4 Ult LR ==0.79 1993 1994 1995 1996 1997 t t t t t 2250 2000 1750 y y y y 1250 y 1500 1000 750 500 250 Premium 170.3 Ult LR ==0.51 12 54 36 60 84 108 Deloitte tAnalytics Institute 12 36 60 84 t 108 12 36 60 84 t Development Time 108 12 36 Premium Ult LR = 143.8 0.5 60 84 t 108 12 36 60 84 108 © 2011 Deloitte LLP t Adding Bayesian structure • Our hierarchical model is “half-way Bayesian” • On the one hand, we place probability sub-models on certain parameters • But on the other hand, various (hyper)parameters are estimated directly from the data. • To make this fully Bayesian, we need to put probability distributions on all quantities that are uncertain. • We then employ Bayesian updating: the model (“likelihood function”) together with the prior results in a posterior probability distribution over all uncertain quantities. • Including ultimate loss ratio parameters and hyperparameters! We are directly modeling the ultimate quantity of interest. • This is not as hard as it sounds: • We do not explicitly calculate the high-dimensional posterior probability distribution. • We do use Markov Chain Monte Carlo [MCMC] simulation to sample from the posterior. • Technology: JAGS (“Just Another Gibbs Sampler”), called from within R. 55 Deloitte Analytics Institute © 2011 Deloitte LLP Example (with Wayne Zhang and Vanja Dukic) • Posterior credible intervals of incremental losses – by accident year • Based on non-linear hierarchical growth curve model 56 Deloitte Analytics Institute © 2011 Deloitte LLP Example (with Wayne Zhang and Vanja Dukic) • Posterior credible intervals of incremental losses – by accident year • Based on non-linear hierarchical growth curve model 57 Deloitte Analytics Institute © 2011 Deloitte LLP Posterior distribution of aggregate outstanding losses • Non-informative priors were used • Different priors tested as a sensitivity analysis Outstanding Loss Estimates at Different Evaluation Points Estimated Ultimate Losses Minus Losses to Date At 120 Months chain ladder estimate • A full posterior distribution falls out of the analysis • No need for boostrapping, ad hoc simulations, settling for a point estimate with a confidence interval 500 1000 1500 2000 2500 3000 3500 4000 3000 3500 4000 3000 3500 4000 At 180 Months • Use of non-linear (growth curve) model enables us to project beyond the range of the data • Choice of growth curves affects the estimates more than the choice of priors! 500 1000 1500 2000 2500 At Ultimate • This choice “does the work of” a choice of tail factors 58 Deloitte Analytics Institute 500 1000 1500 2000 2500 © 2011 Deloitte LLP Why Bayes • “A coherent integration of evidence from different sources” • Background information • Expert knowledge / judgment (“subjectivity” is a feature, not a bug) • Other datasets (e.g. multiple triangles) • Shrinkage, “borrowing strength”, hierarchical model structure – all coin of the realm • Rich output: full probability distribution estimates of all quantities of interest • Ultimate loss ratios by accident year • Outstanding loss amounts • Missing values of any cell in a loss triangle • Model the process that generates the data • As opposed to modeling the data with “procedural” methods • We can fit models as complex (or simple) as the situation demands • Nonlinear growth patterns, trends, autoregressive, hierarchical, structure, … • Conceptual clarity • Single-case probabilities make sense in the Bayesian framework • Communication of risk: “mean what you say and say what you mean” 59 Deloitte Analytics Institute © 2011 Deloitte LLP A Parting Thought Parting thought: our field’s Bayesian heritage “Practically all methods of statistical estimation… are based on… the assumption that any and all collateral information or a priori knowledge is worthless. It appears to be only in the actuarial field that there has been an organized revolt against discarding all prior knowledge when an estimate is to be made using newly acquired data.” -- Arthur Bailey (1950) 61 Deloitte Analytics Institute © 2011 Deloitte LLP Parting thought: our field’s Bayesian heritage “Practically all methods of statistical estimation… are based on… the assumption that any and all collateral information or a priori knowledge is worthless. It appears to be only in the actuarial field that there has been an organized revolt against discarding all prior knowledge when an estimate is to be made using newly acquired data.” -- Arthur Bailey (1950) ... And today, in the age of MCMC, cheap computing, and open-source software... 62 Deloitte Analytics Institute © 2011 Deloitte LLP Parting thought: our field’s Bayesian heritage “Practically all methods of statistical estimation… are based on… the assumption that any and all collateral information or a priori knowledge is worthless. It appears to be only in the actuarial field that there has been an organized revolt against discarding all prior knowledge when an estimate is to be made using newly acquired data.” -- Arthur Bailey (1950) ... And today, in the age of MCMC, cheap computing, and open-source software... “Scientific disciplines from astronomy to zoology are moving to Bayesian data analysis. We should be leaders of the move, not followers.” -- John Kruschke, Indiana University Psychology (2010) 63 Deloitte Analytics Institute © 2011 Deloitte LLP Appendix: Some MCMC Intuition Metropolis-Hastings Intuition • Let’s take a step back and remember why we’ve done all of this. • In ordinary Monte Carlo integration, we take a large number of independent draws from the probability distribution of interest and let the sample average of {g(i)} approximate the expected value E[g()]. 1 N N g ( i 1 (i ) ) N g ( ) ( )d Eg ( ) • The Strong Law of Large Numbers justifies this approximation. • But: when estimating Bayesian posteriors, we are generally not able to take independent draws from the distribution of interest. • Results from the theory of stochastic processes tell us that suitably well-behaved Markov Chains can also be used to perform Monte Carlo integration. 65 Deloitte Analytics Institute © 2011 Deloitte LLP Some Facts from Markov Chain Theory How do we know this algorithm yields reasonable approximations? • Suppose our Markov chain 1, 2, … with transition matrix P satisfies some “reasonable conditions”: • Aperiodic, irreducible, positive recurrent (see next slide) • Chains generated by the M-H algorithm satisfy these conditions • Fact #1 (convergence theorem): P has a unique stationary (“equilibrium”) distribution, . (i.e. =P). Furthermore, the chain converges to . • Implication: We can start anywhere in the sample space so long as we through out a sufficiently long “burn-in”. • Fact #2 (Ergodic Theorem): suppose g() is some function of . Then: 1 N 66 N g ( i 1 (i ) ) N g ( ) ( )d Eg ( ) • Implication: After a sufficient burn-in, perform Monte Carlo integration by averaging over a suitably well-behaved Markov chain. • The values of the chain are not independent, as required by the SLLN. • But the Ergodic Theorem says we’re close enough to independence to get what we need. © 2011 Deloitte LLP Deloitte Analytics Institute Conditions for Ergodicity More on those “reasonable conditions” on Markov chains: • Aperiodic: The chain does not regularly return to any value in the state space in multiples of some k>1. • Irreducible: It is possible to go from any state i to any other state j in some finite number of steps. • Positive recurrent: The chain will return to any particular state with probability 1, and expected return time finite. • Intuition: • The Ergodic Theorem tells us that (in the limit) the amount of time the chain spends in a particular region of state space equals the probability assigned to that region. • This won’t be true if (for example) the chain gets trapped in a loop, or won’t visit certain parts of the space in finite time. • The practical problem: use the Markov chain to select a representative sample from the distribution , expending a minimum amount of computer time. 67 Deloitte Analytics Institute © 2011 Deloitte LLP