ACTUARIAL ANALYSIS OF LONGEVITY RISK Michel DENUIT michel.denuit@uclouvain.be Louvain School of Statistics, Biostatistics and Actuarial Science (LSBA) UCL, Belgium Academic Year 2013-2014 1 / 288 Observed mortality trends (Source: HMD, www.mortality.org) Stochastic modelling for survival analysis Graduation and smoothing via local regression (Source: Statistics Belgium) Cohort life tables and mortality projection models Model risk Adverse selection and risk classification Credibility for death counts Systematic mortality risk in life insurance First-order life tables Pandemics Managing longevity risk 2 / 288 Outline Observed mortality trends (Source: HMD, www.mortality.org) Stochastic modelling for survival analysis Graduation and smoothing via local regression (Source: Statistics Belgium) Cohort life tables and mortality projection models Model risk Adverse selection and risk classification Credibility for death counts Systematic mortality risk in life insurance First-order life tables Pandemics Managing longevity risk 3 / 288 Preamble Life expectancy at birth Early humans ≈ 20 years Around 1850 ≈ 40 years Around 1950 ≈ 60 years Around 2000 ≈ 70 years 4 / 288 Preamble (Ctd) • The average life span thus roughly tripled over the course of human history, and much of this increase has happened in the past 150 years. • Two trends dominated the mortality decline between 1900 and 2000: - The first half of the 20th century saw significant improvements in the mortality of infants and children (and their mothers). - Since the middle of the 20th century, gains in life expectancy have been due more to medical factors that have reduced mortality among older persons (reductions in deaths due to the “big three” killers – cardiovascular disease, cancer, and strokes). 5 / 288 Death rates • Let T be the lifetime, or age at death of an individual from some population. • The force of mortality at age x, denoted as µx , is defined by Pr[x < T ≤ x + ∆|T > x] . ∆&0 ∆ µx = lim • Henceforth, we assume that the forces of mortality are piecewise constant, i.e. µx+ξ = µx for 0 ≤ ξ < 1 and integer x. • If the true force of mortality varies slowly over the year of age, the constant force of mortality assumption is reasonable. • Let x 7→ µx (t) be the forces of mortality for calendar year t. 6 / 288 0 Forces of mortality −8 −6 ln µx −4 −2 2000 1950 1900 1850 0 20 40 60 80 100 x 7 / 288 Trend in forces of mortality −5.0 ln µ40(t) −5.5 −6.0 −6.0 −6.5 −6.5 −7.0 ln µ20(t) −5.5 −4.5 −5.0 −4.5 −4.0 Forces of mortality (on the log scale) for Belgian males at ages 20 and 40, period 1841-2009 1850 1900 1950 t 2000 1850 1900 1950 2000 t 8 / 288 Trend in forces of mortality −2.2 ln µ80(t) −4.5 −2.6 −2.4 −4.0 ln µ60(t) −2.0 −3.5 −1.8 −1.6 −3.0 Forces of mortality (on the log scale) for Belgian males at ages 60 and 80, period 1841-2009 1850 1900 1950 t 2000 1850 1900 1950 2000 t 9 / 288 Mortality surface (x, t) 7→ µx (t) −2 −4 −6 −8 −10 1850 100 1900 80 60 t 1950 40 x 20 2000 0 10 / 288 30 40 50 e0(t) 60 70 e0 = E[T ] 1850 1900 1950 2000 t 11 / 288 10 12 e65(t) 14 16 e65 = E[T − 65|T > 65] 1850 1900 1950 2000 t 12 / 288 20 15 Std Dev. 25 p V[T ] 1920 1940 1960 1980 2000 t 13 / 288 20 25 30 IQR 35 40 45 Interquartile range FT−1 (0.75) − FT−1 (0.25) 1920 1940 1960 1980 2000 t 14 / 288 5.5 6.0 Std Dev. 6.5 7.0 p V[T − 65|T > 65] 1920 1940 1960 1980 2000 t 15 / 288 10.0 10.5 IQR 11.0 11.5 12.0 Interquartile range FT−1|T >65 (0.75) − FT−1|T >65 (0.25) 1920 1940 1960 1980 t 2000 16 / 288 Remaining lifetime • The probability that an individual alive at age x survives to age x + ξ is denoted as ξ px = Pr[T > x + ξ|T > x] = 1 − ξ qx with ω−x px =0 for some ultimate age ω such that Pr[T ≤ ω] = 1. • We have ∂ 1 ∂ ln t px t px = − p ∂t ∂t t x Z t ⇔ t px = exp − µx+τ dτ . µx+t = − 0 • The probability density function of T is d d Pr[T ≤ x] = x q0 = x p0 µx . dx dx 17 / 288 0.4 S(x) 0.6 0.8 1.0 Rectangularization of the survival curve x 7→ S(x) = x p0 0.0 0.2 2000 1950 1900 1850 0 20 40 60 80 100 x 18 / 288 0.01 0.02 2000 1950 1900 1850 0.00 f(x) 0.03 0.04 Density x 7→ x p0 µx of T 0 20 40 60 80 100 x 19 / 288 x 7→ ex = E[T − x|T > x] d e dx x = −1 + µx ex 60 Child mortality hump: 40 20 0 ex 2000 1950 1900 1850 0 20 40 60 80 100 x 20 / 288 Conclusions • Mortality is on the move. • Long-term actuarial calculations based on historical life tables (even the most recent one) are likely to be erroneous. • The valuation of long-term life insurance liabilities requires life tables incorporating the expected changes in life duration. ⇒ There is a need for “projected” life tables. • First of all, we need a biometric model incorporating both dimensions - attained age x, and - calendar time t. 21 / 288 Outline Observed mortality trends (Source: HMD, www.mortality.org) Stochastic modelling for survival analysis Graduation and smoothing via local regression (Source: Statistics Belgium) Cohort life tables and mortality projection models Model risk Adverse selection and risk classification Credibility for death counts Systematic mortality risk in life insurance First-order life tables Pandemics Managing longevity risk 22 / 288 Remaining lifetime • Let T0 (t) be the time to death, or lifetime of an individual from some population, born in year t. • Let Tx (t) be the remaining lifetime of an individual aged x in calendar year t. • This individual will so die at age x + Tx (t). • Denote ξ px (t) = Pr[Tx (t) > ξ] = Pr[T0 (t − x) > x + ξ|T0 (t − x) > x]. • The distribution function of Tx (t) is ξ qx (t) = 1 − ξ px (t) = Pr[Tx (t) ≤ ξ]. 23 / 288 One-year survival/death probabilities • The one-year death probability at age x in year t is defined as qx (t) = Pr[Tx (t) ≤ 1] = Pr[T0 (t − x) ≤ x + 1|T0 (t − x) > x]. • The one-year survival probability at age x is defined as px (t) = Pr[Tx (t) > 1] = Pr[T0 (t − x) > x + 1|T0 (t − x) > x]. • Clearly, for any integer k, k px (t) = px (t)px+1 (t + 1) . . . px+k−1 (t + k − 1). 24 / 288 Forces of mortality • The force of mortality at age x in calendar year t, denoted as µx (t), is defined by Pr[x < T0 (t − x) ≤ x + ∆|T0 (t − x) > x] . ∆→0 ∆ µx (t) = lim • The survival function of Tx (t) is Z τ p (t) = exp − µ (t + ξ)dξ . τ x x+ξ 0 • The probability density function of Tx (t) is τ 7→ − ∂ τ px (t) = τ px (t)µx+τ (t + τ ). ∂τ 25 / 288 Period vs. cohort life tables • The period life table {qx (t), x = 0, 1, . . . , ω} is obtained using data collected during a given calendar year t (or a few consecutive years, typically 3 to 5). • The cohort life table {qx (t + x), x = 0, 1, . . . , ω} follows the generation born in calendar year t and, thus, incorporates mortality changes over time. • Clearly, in a situation where longevity is increasing, we have qx+k (t + k) < qx+k (t). • Period life tables thus underestimate liabilities relating to insurance contracts with benefits in case of survival. • However, the period life table {qx (t), x = 0, 1, . . . , ω} is known at time t + 1 whereas the cohort life table {qx (t + x), x = 0, 1, . . . , ω} remains unknown, except for q0 (t). 26 / 288 Piecewize constant force of mortality • Assumption: µx+ξ (t + τ ) = µx (t) for 0 ≤ ξ, τ < 1 and integer x and t • for integer age x and calendar year t, Z 1 px (t) = exp − µx+ξ (t + ξ) dξ = exp − µx (t) . {z } 0 | =µx (t) • Let Lxt be the number of individuals aged x at the beginning of year t. • The (central) exposure-to-risk Z 1 ETRxt = Lx+ξ,t+ξ dξ 0 measures the time during which these Lxt individuals are exposed to the risk (of dying) in year t. 27 / 288 Expected exposure-to-risk Z E[ETRxt |Lxt = k] = E 0 1 Lx+ξ,t+ξ dξ Lxt = k 1 Z 0 E [Lx+ξ,t+ξ |Lxt = k] dξ | {z } Z 1 = =k ξ px (t) = k ξ px (t)dξ 0 Z = k 1 exp − ξµx (t) dξ 0 = ⇒ ETRxt ≈ k 1 − px (t) µx (t) −Lxt qx (t) provided Lxt is large enough. ln(1 − qx (t)) 28 / 288 Cohort life expectancy ex% (t) Z ω−x = E[Tx (t)] = 0 Z ω−x = ξ px (t)dξ = = = 0 ω−x−1 X Z k+1 ξd ξ qx (t) ξ px (t)dξ k k=0 ω−x−1 X Z k+1 k px (t) k=0 ω−x−1 X k=0 ξ−k px+k (t + k)dξ k Z k px (t) 1 ξ px+k (t + k)dξ 0 29 / 288 Cohort life expectancy Now, assuming piecewize constant forces of mortality, ξ px+k (t + k) = exp(−ξµx+k (t + k)) so that ex% (t) = 1 − exp(−µx (t)) µx (t) ω−x−1 k−1 X X 1 − exp(−µx+k (t + k)) . + exp − µx+j (t + j) µx+k (t + k) k=1 j=0 30 / 288 Period life expectancy • Period calculations freeze the value of t and do not follow cohorts. • The period life expectancy for calendar year t is ex↑ (t) = 1 − exp(−µx (t)) µx (t) ω−x−1 k−1 X X 1 − exp(−µx+k (t)) + exp − µx+j (t) . µx+k (t) k=1 j=0 • Note that the calculation of ex↑ (t) for past t does not require mortality projections and is therefore objective, not subject to model risk. • Period life expectancies at birth e0↑ (t) or at retirement age ↑ e65 (t) are often used as synthetic mortality indicators. 31 / 288 Life annuity premium • The present value of life annuity payments to a policyholder aged x in year t is ω−x X bTx (t)c I[Tx (t) ≥ k]v (0, k) = k=1 X v (0, k) k=1 where - the discount factor v (s, t) is the value at time s of a unit payment made at time t, s ≤ t, and - the indicator I[A] = 1 if event A is realized and 0 otherwise. • The expected present value of these payments is bTx (t)c X ax (t) = E v (0, k) k=1 = ω−x−1 X exp − k X µx+j (t + j) v (0, k + 1). j=0 k=0 | {z =k+1 px (t)=E I[Tx (t)≥k] } 32 / 288 Outline Observed mortality trends (Source: HMD, www.mortality.org) Stochastic modelling for survival analysis Graduation and smoothing via local regression (Source: Statistics Belgium) Cohort life tables and mortality projection models Model risk Adverse selection and risk classification Credibility for death counts Systematic mortality risk in life insurance First-order life tables Pandemics Managing longevity risk 33 / 288 Mortality surface −2 Death prob. −4 cale) (log s −6 −8 1950 1960 100 1970 80 60 e m Ti 1980 40 1990 e Ag 20 2000 0 34 / 288 Estimation of µx • Given a set of mortality (or sickness, disability, etc.) rates, the actuary is asked to adjust the observations so that the graduated values of the series capture the main trend in the data. • Consider a given calendar year t ? and suppress the time index, that is, denote µx = µx (t ? ), etc. • Assume that we have observed an homogeneous group of Lx individuals aged x. • Individual i entered the group at time ai and left it at time bi . • The exposure-to-risk for individual i, denoted as τi , is the time spent in the group by individual i, that is, τi = bi − ai . 35 / 288 Estimation of µx (Ctd) • To each of these Lx individuals, we associate 1 if individual i dies δi = 0 otherwise, i = 1, 2, . . . , Lx . • We assume that we have at our disposal independent and identically distributed observations (δi , τi ) for each of the Lx individuals. • These individuals are thus subject to the same (constant) force of mortality µx and we aim to estimate this unknown parameter by maximum likelihood (ML). 36 / 288 Estimation of µx (Ctd) • The contribution of individual i to the likelihood writes - if he survives (δi = 0) Z bi −ai px+ai = ! bi exp − µx+ξ dξ ai = exp − (bi − ai )µx = exp(−τi µx ); - if he dies (δi = 1) bi −ai px+ai µx+bi = exp(−τi µx )µx . • Therefore, the contribution of individual i to the likelihood is exp(−τi µx ) if δi = 0 δ exp(−τi µx ) µx i = exp(−τi µx )µx if δi = 1. 37 / 288 Estimation of µx (Ctd) • We then have L µx = Lx Y exp(−τi µx ) µx δi i=1 = exp −µx Lx X ! τi µx P Lx i=1 δi . i=1 • Clearly, PLx i=1 τi = ETRx and L µx PLx i=1 δi = Dx , so that Dx . = exp − ETRx µx µx • The corresponding log-likelihood is L µx = −ETRx µx + Dx ln µx . 38 / 288 Estimation of µx (Ctd) • The ML estimator for µx is then obtained from Dx Dx d L µx = −ETRx + =0⇒µ bx = dµx µx ETRx d2 Dx L µx = − 2 < 0. 2 dµx µx • Recall that the death rate is defined as mx = E[Dx ] Dx bx = ⇒m =µ bx . E[ETRx ] ETRx • Compare this to qx = E[Dx ] Dx bx = ⇒q . E[Lx ] Lx 39 / 288 Estimation of µx (Ctd) • Assume that individual i has his birthday at ci , with ai < ci < bi . • Then, this individual is counted for an exposure ci − ai in the group aged x and for an exposure bi − ci in the group aged x + 1. • If δi = 1 then the death is allocated to the group aged x + 1. 40 / 288 Estimation of µx (Ctd) • Large sample properties of the ML estimators ensure that V[b µx ] ≈ d2 − 2 L µx dµx • Moreover, µ bx ≈d N or −1 µ b2 µx , x Dx = µ2x . Dx . • An approximate (1 − α) CI for µx is given by µ bx bx ± zα/2 √ . µ Dx | {z } error margin 41 / 288 Example • The following table displays the mortality statistics for policyholders in a given portfolio observed during calendar year 2010: Age 50 51 Exposure-to-risk 3160.25 3094.50 Number of Deaths 62 71 • We assume piecewize constant forces of mortality (i.e. µx+ξ = µx for every integer x and 0 < ξ < 1), 42 / 288 Example 71 L(µ50 , µ51 ) = exp − 3160.25µ50 µ62 50 exp − 3094.50µ51 µ51 ∂ ln L(µ50 , µ51 ) = ∂µ50 ∂ ∂µ51 ∂ − 3160.25µ50 + 62 ln µ50 ∂µ50 62 = 0⇒µ b50 = 3160.25 ∂ ln L(µ50 , µ51 ) = − 3094.50µ51 + 71 ln µ51 ∂µ51 71 = 0⇒µ b51 = 3094.50 43 / 288 Example b µ50 ] V[b = − = µ b250 ∂2 ∂µ250 62 = 1 ln L(µ50 , µ51 ) µ50 =b µ50 ,µ51 =b µ51 62 2 3160.25 62 ⇒ µ b50 ≈d N or µ50 , 3160.252 1 b V[b µ51 ] = − ∂ 2 ln L(µ , µ ) 50 51 ∂µ2 µ50 =b µ50 ,µ51 =b µ51 51 = µ b251 71 = 71 2 3094.50 ⇒ µ b51 ≈d N or µ51 , 71 3094.502 44 / 288 Example √ # 62 CI95% (µ50 ) = µ b50 ± 1.96 3160.25 " # √ 71 CI95% (µ51 ) = µ b51 ± 1.96 3094.50 " \50 ) b p50 = exp(−µ = exp(−b µ50 ) = exp − 62 3160.25 71 3094.50 \51 ) b p51 = exp(−µ = exp(−b µ51 ) = exp − 45 / 288 Example √ " CI95% (p50 ) = exp exp " CI95% (p51 ) = exp exp p50 2d ! 62 −b µ50 − 1.96 , 3160.25 !# √ 62 −b µ50 + 1.96 3160.25 ! √ 71 −b µ51 − 1.96 , 3094.50 !# √ 71 −b µ51 + 1.96 3094.50 = b p50 b p51 = exp(−b µ50 − µ b51 ) 62 71 = exp − − 3160.25 3094.50 46 / 288 Example V[b µ50 + µ b51 ] = µ b50 + µ b51 ≈d CI95% (2 p50 ) " V[b µ50 ] + V[b µ51 ] under mutually exclusive age classes 62 71 N or µ50 + µ51 , + 3160.252 3094.502 = r exp −b µ50 − µ b51 − 1.96 r exp −b µ50 − µ b51 + 1.96 62 71 + 2 3160.25 3094.502 ! 62 71 + 2 3160.25 3094.502 !# , 47 / 288 x 7→ ln µ bx −10 −8 −6 ln µx −4 −2 0 2009 life table for Belgian males, general population 0 20 40 60 80 100 x 48 / 288 Yearly variations −10 −8 −6 ln µx −4 −2 0 2007 (in red), 2008 (in green) and 2009 life tables 0 20 40 60 80 100 x 49 / 288 Local regression model • Even if the general shape of the mortality curve is stable over time, year-specific erratic variations remain. • As long as these erratic variations do not reveal anything about the underlying mortality pattern, they should be removed before entering actuarial calculations. • Graduation is used to remove the random departures from the underlying smooth curve x 7→ µx . • In order to smooth the crude estimates µ bx , we can use ln µ bx = f x + x with independent x ∼ N or (0, σ 2 ), where f is unspecified, but assumed to be smooth. 50 / 288 Local polynomial regression • Let us consider the model Yi = f (xi ) + i , i = 1, 2, . . . , n, where the independent i ∼ N or (0, σ 2 ) represent random departures from f (·) in the observations, or variability from sources not included in the xi ’s. • No strong assumptions are made about f , except that it is a smooth function. • Smoothness here means continuity and (higher order) differentiability. • Invoking Taylor’s theorem, - any differentiable function can be approximated locally by a straight line, and - any twice differentiable can be approximated locally by a quadratic polynomial. 51 / 288 Example • Consider f (x) = 2 + (x − 4)2 and xi = i+4 5 , i = 1, 2, . . . , 31. • Simulate data from yi = 2 + (xi − 4)2 + i , i ∼ N or (0, 1), i = 1, 2, . . . , 31. • The standard way to get a good fit in such a case is to regress the yi ’s on the xi ’s and on their squares xi2 ’s, i.e. to fit yi = β0 + β1 xi + β2 xi2 + i with i ∼ N or (0, σ 2 ). • This is because we can clearly see the quadratic shape of the data which indicates that we face a linear regression model with x and x 2 as explanatory variables. • But how can we proceed if we cannot guess the transformed explanatory variables from graphing the data? 52 / 288 2 4 6 y 8 10 Simulated data and global linear fit yi = β0 + β1 xi + i 1 2 3 4 5 6 7 x 53 / 288 2 4 6 y 8 10 How to estimate f at x5 ? 1 2 3 4 5 6 7 x 54 / 288 2 4 6 y 8 10 Selection of a neighborhood {x2 , x3 , x4 , x5 , x6 , x7 , x8 } 1 2 3 4 5 6 7 x 55 / 288 2 4 6 y 8 10 Local linear fit at x5 based on {x2 , x3 , x4 , x5 , x6 , x7 , x8 }, giving b f (x5 ) = βb0 (x5 ) + βb1 (x5 )x5 1 2 3 4 5 6 7 56 / 288 2 4 6 y 8 10 Local linear fit at x15 , giving b f (x15 ) = βb0 (x15 ) + βb1 (x15 )x15 1 2 3 4 5 6 7 x 57 / 288 2 4 6 y 8 10 Local linear fit at x27 , giving b f (x27 ) = βb0 (x27 ) + βb1 (x27 )x27 1 2 3 4 5 6 7 x 58 / 288 2 4 6 y 8 10 Results with a neighborhood of size 7 1 2 3 4 5 6 7 x 59 / 288 2 4 6 y 8 10 Size 1 (black), 3 (red), 7 (blue), and 11 (green) 1 2 3 4 5 6 7 x 60 / 288 Key ingredients of the approach • Simple linear regression model, fitted by least squares, but. . . locally! • For each observation, a neighborhood has to be defined: increasing its size produces smoother fit. • However, increasing the size of the neighborhood excludes more observation points near both ends of the data set with our basic strategy. • This could be improved by assigning weights to each observation of the data set. • These weights depend on the point where the model is fitted. 61 / 288 Local polynomial regression by WLS • To estimate f at some point x, the observations are weighted in such a way that - larger weights are assigned to observations close to x and - smaller weights to those that are further. • The λ nearest neighbors of x are gathered in the set V(x). • The value of λ is often expressed as a proportion α of the data set (α represents the percentage of the observations comprised in every smoothing window). • The fitted value of f (x) is obtained from weighted least-squares, on the basis of the observations in V(x). 62 / 288 Local polynomial regression by WLS • Given a weight function K (·), a weight wi (x) is assigned to (xi , yi ) in V(x) to estimate f (x). • Zero weights are assigned to observations outside V(x). • Provided the sample size is large enough, the choice of the weight function is not too critical and the tricube weight function appears as a convenient choice. • Weights are then given by 3 3 |x−xi | for (xi , yi ) ∈ V(x) 1 − max i∈V(x) |x−xi | wi (x) = 0 otherwise. 63 / 288 Local polynomial regression by WLS • Within the smoothing window V(x), f is approximated by a polynomial with coefficients specific to x, that is, f (t) ≈ β0 (x) + β1 (x)t + β2 (x)t 2 + . . . + βp (x)t p for t ∈ V(x) (think of Taylor expansion). • Usually, p ≤ 2 so that only - local constant regression (p = 0), - local linear regression (p = 1), and - local quadratic regression (p = 2) are considered. • The βj (x)’s are estimated by WLS, with weights specific to x. 64 / 288 Local linear fit (p = 1) f (t) ≈ β0 (x) + β1 (x)t for t ∈ V(x) yi = β0 (x) + β1 (x)xi + i , i = 1, . . . , n, with weights wi (x) assigned to (xi , yi ) in V(x) n X 2 βb0 (x), βb1 (x) = arg min wi (x) yi − β0 (x) − β1 (x)xi i=1 b f (x) = βb0 (x) + βb1 (x)x P Pn ni=1 wi (x) xi − x w yi wi (x)yi i=1 = Pn + x − x w Pn 2 i=1 wi (x) i=1 wi (x) xi − x w Pn wi (x)xi where x w = Pi=1 . n i=1 wi (x) 65 / 288 Link with Nadaraya-Watson estimate (p = 0) • If we approximate locally f by a constant β0 (x) βb0 (x) = arg min n X wi (x) yi − β0 (x) 2 i=1 Pn wi (x)yi b f (x) = βb0 (x) = Pi=1 n i=1 wi (x) = Nadaraya-Watson estimate. • Many ancient actuarial graduation formulas are of this form. • For instance, Finlaison suggested in 1829 to smooth the qx by 1 25 q̂x−4 + 2q̂x−3 + 3q̂x−2 + 4q̂x−1 + 5q̂x + 4q̂x+1 + 3q̂x+2 +2q̂x+3 + q̂x+4 and other similar formulas followed by Woolhouse in 1866, by Karup in 1899, by Spencer in 1907, by Gréville in 1967, etc. • Local linear fit often reduces to Nadaraya-Watson estimate in the center of the data (when x w = x). 66 / 288 −10 −8 −6 ln µx −4 −2 0 Data 0 20 40 60 80 100 x 67 / 288 −10 −8 −6 ln µx −4 −2 Fits with α = 5% (black), 25% (green), and 50% (red) 0 20 40 60 80 100 x 68 / 288 Selecting α by cross-validation {(x2 , y2 ), (x3 , y3 ), . . . , (xn , yn )} → b f (−1) prediction yb1 = b f (−1) (x1 ) {(x1 , y1 ), (x3 , y3 ), . . . , (xn , yn )} → b f (−2) → prediction yb2 = b f (−2) (x2 ) → .. . .. . .. . b {(x1 , y1 ), (x2 , y2 ), . . . , (xn−1 , yn−1 )} → f (−n) → prediction ybn = b f (−n) (xn ) n 2 1X b yi − b f (−i) (xi ) ⇒ CV (f ) = n i=1 69 / 288 0.08 GCV 0.06 0.15 0.04 0.10 0.05 GCV 0.20 0.25 0.10 0.30 (G)CV plot 10 20 30 Fitted DF 40 10 15 20 25 30 35 40 Fitted DF 70 / 288 0 Crude death rates and loess fit −6 ln µx(t) −4 −2 o o o o o o o o o oo oo ooo oo o o oo oo oo o oo oo oo oo o o ooo oo oo ooo o ooo oo oo o o ooo ooo ooo o o o o oo o o o o oooooo o o o o o o −8 o o o o o oo o ooo o −10 o oo o o 0 20 40 60 80 100 x 71 / 288 0.0 −0.2 −0.4 −0.6 Pearson Residuals 0.2 Standardized residuals 0 20 40 60 80 100 x 72 / 288 Alternative local regression model bx , we could also use • In order to smooth the crude estimates q the model ln bx q = f (x) + x with independent x ∼ N or (0, σ 2 ), b px where f is left unspecified. • The x s represent the random departures from f (·) in the observations. • No strong assumptions are made about f , except that it is a smooth function that can approximated locally by a straight line or by a quadratic polynomial. • The same approach applies to estimate f , and the same problem is faced with residuals. 73 / 288 Binomial regression for death counts • If the initial size Lx of the closed group of individuals aged x is available, we can also use the model exp f (x) Dx ∼ Bin(Lx , qx ) where qx = 1 + exp f (x) for some smooth, unspecified function f . • The estimation proceeds as above, substituting the Binomial likelihood to the Poisson one. • This approach is suitable for closed homogeneous populations, but should be applied to open insurance portfolios with great care (Poisson regression is preferred in this case). • At the general population level, the Binomial regression model can often be considered as reasonable. 74 / 288 Dx 1000 500 40000 0 20000 0 Lx 60000 1500 80000 Data 0 3 6 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 102 0 3 6 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 102 Age Age 75 / 288 600 400 200 LCV 800 1000 (Likelihood) Cross-validation plot 4 6 8 10 12 Fitted DF 76 / 288 −8 −6 ln(qx) −4 −2 Binomial model, resulting fit 0 20 40 60 80 100 x 77 / 288 1 0 −1 −2 Deviance Residuals 2 3 Binomial model, deviance residuals 0 20 40 60 80 100 x 78 / 288 Poisson regression for death counts • As explained above, the likelihood to be maximized to derive the ML estimator of µx is given by Dx L µx = exp − ETRx µx µx . • The likelihood L µx is proportional to the likelihood based on Dx ∼ Poisson ETRx µx , that is, L µx ∝ exp − ETRx µx Dx ETRx µx Dx ! . • Therefore, it is equivalent to work on the basis of the “true” likelihood or onthe basis of the Poisson likelihood to estimate µx = exp f (x) . 79 / 288 Closure of the annual life tables • Data at old ages produce suspect results (because of small risk exposures). bx • According to standard actuarial practice, the series x 7→ q should be extrapolated to some ultimate age ω before being used for actuarial computations. • Several procedures have been proposed by actuaries and demographers. • These procedures often produce very different qx at older ages x buth with relatively little effect on present values of liabilities. • Here, we follow a simple technique consisting in closing the bx ’s by means of a quadratic regression of ln q bx on age x, q bx ) data points with x above some high fitted to the (x, ln q threshold. 80 / 288 A simple and powerful ad-hoc method • Specifically, we use a log-quadratic regression model of the form bx = a + bx + cx 2 + ζx with independent ζx ∼ N or (0, σζ2 ) ln q fitted to oldest ages (the starting age is selected so to maximize goodness-of-fit). • The ultimate age ω can be chosen from the data; here, we take ω = 130. • We then impose (i) a closure constraint q130 = 1 ⇔ ω = 130 0 (ii) an inflexion constraint q130 =0 yielding a + bx + cx 2 = c(1302 − 260x + x 2 ). 81 / 288 0.9975 0.9980 R2 0.9985 0.9990 R 2 according to starting age 60 65 70 75 80 85 90 Initial age 82 / 288 −8 −6 ln qx −4 −2 0 Closed qx 0 20 40 60 80 100 120 x 83 / 288 −8 −6 ln qx −4 −2 0 Closed qx 0 20 40 60 80 100 120 x 84 / 288 Smoothing the mortality surface • Instead of smoothing the yearly mortality experience, we could also smooth the entire mortality surface. • To this end, we use the local polynomial regression model ln µ bx (t) = f (x, t)+x (t) with independent x (t) ∼ N or (0, σ 2 ). • The smooth f (·, ·) is approximated locally by a quadratic polynomial in age and time, that is, f (ξ, τ ) ≈ β0 (x, t) + β1 (x, t)ξ + β2 (x, t)τ +β3 (x, t)ξ 2 + β4 (x, t)ξτ + β5 (x, t)τ 2 for (ξ, τ ) sufficiently close to (x, t). • Weights are defined so that observations close to (x, t) receive the largest weights and weighted least-squares, Poisson or Binomial ML techniques are used on a local scale. 85 / 288 Unsmoothed mortality surface −2 Death prob. −4 cale) (log s −6 −8 1950 1960 100 1970 80 60 e m Ti 1980 40 1990 e Ag 20 2000 0 86 / 288 0.030 0.035 GCV 0.040 0.045 GCV 500 1000 1500 2000 Fitted DF 87 / 288 Smoothed mortality surface −2 Death −6 ale) og sc (l prob. −4 −8 1950 1960 100 1970 80 60 e m Ti 1980 40 1990 e Ag 20 2000 0 88 / 288 Outline Observed mortality trends (Source: HMD, www.mortality.org) Stochastic modelling for survival analysis Graduation and smoothing via local regression (Source: Statistics Belgium) Cohort life tables and mortality projection models Model risk Adverse selection and risk classification Credibility for death counts Systematic mortality risk in life insurance First-order life tables Pandemics Managing longevity risk 89 / 288 Data structure bx (t) or µ • Data structure 1: only q bx (t) are available for a set of ages x and calendar years t = tmin , . . . , tmax . bx (t) used for official mortality • In Belgium for instance, q projections given for t = 1948, 1949, . . . and 0, . . . , 100 for t = 1948, . . . , 1993 0, . . . , 101 for t = 1994, . . . , 1998 x= 0, . . . , 105 for t = 1999, . . . • The numerator and denominator of these demographic indicators may be available separately: - Data structure 2: Dxt and Lxt are available, - Data structure 3: Dxt and ETRxt are available. 90 / 288 Links Under piecewize constant force of mortality, b x (t) = µ bx (t) = m Dxt ETRxt Dxt Lxt bx (t) µ bx (t) = − ln 1 − q bx (t) = q bx (t) = 1 − exp − µ q bx (t) bx (t) Lxt q ETRxt ≈ − bx (t)) ln (1 − q provided Lxt is large enough. 91 / 288 Classical parametric approach • Classically, some parametric model (e.g. Makeham or Heligman-Pollard) is fitted to each calendar year data. • Under Makeham law, we have µx (t) = at + bt ctx for calendar year t. • Parameters (at , bt , ct ) are estimated on the basis of mortality statistics gathered in year t, for t = tmin , . . . , tmax . • Then, {b at , t = tmin , . . . , tmax }, {b bt , t = tmin , . . . , tmax } and {b ct , t = tmin , . . . , tmax } are treated as independent time series and extrapolated to the future, providing the actuary with projected life tables. 92 / 288 Makeham model µ bxmin (tmin ) µ bxmin (tmin + 1) µ bxmin +1 (tmin ) µ bxmin +1 (tmin + 1) .. .. . . µ bxmax (tmin ) ↓ b atmin b btmin b ctmin µ bxmax (tmin + 1) ↓ b atmin +1 b btmin +1 b ctmin +1 ··· ··· .. . ··· ··· ··· ··· ··· µ bxmin (tmax ) · · · µ bxmin +1 (tmax ) · · · .. . ··· µ bxmax (tmax ) · · · ↓ ··· b atmax → b btmax → b ctmax → µ bxmin (tmax + k) µ bxmin +1 (tmax + k) .. . µ bxmax (tmax + k) ↑ b atmax +k b btmax +k b ctmax +k 93 / 288 Weaknesses • First and foremost, this approach strongly relies on the appropriateness of the underlying parametric model. • For instance, Makeham model generally over-estimates µx (t) for oldest ages, because of its convex shape. • Secondly, the time series {b at , t = tmin , . . . , tmax }, {b bt , t = tmin , . . . , tmax } and {b ct , t = tmin , . . . , tmax } are often strongly dependent. • Using large sample properties of ML estimators, some correlations are typically near to 1. • Multivariate time series models are thus needed for (b at , b bt , b ct ). • Last but not least, the parameter time series often do not reveal any easy-to-extrapolate time trends. 94 / 288 Age-Period-Cohort models ln µx (t) or logitqx (t) = ln qpxx (t) (t) = αx static reference life table p X (j) (j) + βx κt time trends depending on age j=1 +γx λt−x cohort effect with suitable identifiability constraints. 95 / 288 Examples Clayton-Schifflers model ln µx (t) = αx + κt + λt−x Lee-Carter model ln µx (t) = αx + βx κt Renshaw-Haberman model ln µx (t) = αx + βx κt + γx λt−x Debonneuil model ln µx (t) = αx + βx κt + λt−x Cairns-Blake-Dowd model logit(qx (t)) = κt + κt (x − x) (+ quadratic effects of age) Plat model ln µx (t) = αx + κt + κt (x − x) (3) +κt (x − x)+ + λt−x (1) (1) (2) (2) + identifiability constraints 96 / 288 Mortality improvement rates • Other models target the mortality improvement rates µx (t)/µx (t − 1). • Specifically, the specification is ln µx (t) − ln µx (t − 1) = αx + p X (j) (j) βx κt + γx λt−x + Ext j=1 with iid error terms Ext . • Both approaches are related as under the Lee-Carter model for instance ln µx (t)−ln µx (t−1) = αx +βx κt − αx +βx κt−1 = βx (κt −κt−1 ) so that trends are removed. 97 / 288 Lee-Carter model ln µx (t) = αx + βx κt with P µ bxmin (tmin ) µ bxmin (tmin + 1) µ bxmin +1 (tmin ) µ bxmin +1 (tmin + 1) .. .. . . µ bxmax (tmin ) ↓ κ btmin µ bxmax (tmin + 1) ↓ κ btmin +1 x βx = 1 and ··· ··· .. . ··· ··· ··· P t κt = 0 µ bxmin (tmax ) · · · µ bxmin +1 (tmax ) · · · .. . ··· µ bxmax (tmax ) · · · ↓ ··· κ btmax → µ bxmin (tmax + k) µ bxmin +1 (tmax + k) .. . µ bxmax (tmax + k) ↑ κ btmax +k 98 / 288 Estimation methods • Parameters αx , βx and κt can be estimated by least-squares from the empirical forces of mortality µ bx (t). • In case more information is available, more efficient estimation methods can be used: - Poisson or Negative Binomial regression in case both ETRxt ’s and Dxt ’s are available (data structure 3), - and Binomial regression in case both Lxt ’s and Dxt ’s are available (data structure 2). • The choice of the estimation method for the αx ’s, βx ’s and κt ’s thus depends on the data structure. • Smoothing by penalized least-squares or penalized maximum likelihood (or previous smoothing of the mortality surface). 99 / 288 GLM-fit of Lee-Carter model • It is possible to fit the Lee-Carter model using a statistical software that can perform GLM analyses based on Poisson or (Negative) Binomial distributions. • In the Lee-Carter model, age x and time t are treated as factors αx , βx , and κt . • The Lee-Carter model is not a GLM (because of the product βx κt , making the model not linear in the parameters). • It is nevertheless possible to fit the Lee-Carter model as a series of GLM’s: if either βx or κt is known then the Lee-Carter model is in the GLM framework. 100 / 288 GLM-fit of Lee-Carter model (Ctd) • Select starting values for the βx ’s (take for instance βx = 1 for all x). • Updating cycle: 1. given βx , estimate the αx ’s and κt ’s in the appropriate GLM; 2. given κt , estimate the αx ’s and βx ’s in the appropriate GLM; 3. compute the deviance • Repeat the updating cycle until the deviance stabilize. P • P Revert back to the Lee-Carter constraints x βx = 1 and t κt = 0 once convergence is attained. 101 / 288 Smoothed closed mortality surface −2 Death prob. −4 ca (log s −6 le) −8 1950 1960 100 1970 e m Ti 1980 50 1990 e Ag 2000 0 102 / 288 −4 −6 −8 alpha −2 0 Estimated αx 0 20 40 60 80 100 120 Age 103 / 288 0.015 0.010 0.005 0.000 Beta 0.020 0.025 Estimated βx 0 20 40 60 80 100 120 Age 104 / 288 −60 −40 −20 Kappa 0 20 40 Estimated κt 1950 1960 1970 1980 1990 2000 2010 Time 105 / 288 Correcting the OLS κt s The OLS κ bt ’s are adjusted (taking α bx and βbx estimates as given) by imposing either xmax X Dxt = x=xmin xmax X b ETRxt exp(b αx + βbx κ bt ). x=xmin or b ex↑? (t) = b 1 − exp(− exp(b αx ? + βbx ? κ bt )) b bt ) exp(b αx ? + βbx ? κ k−1 X X b + exp − exp(b αx ? +j + βbx ? +j κ bt ) k≥1 j=0 b 1 − exp(− exp(b αx ? +k + βbx ? +k κ bt )) b b exp(b αx ? +k + βx ? +k κ bt ) 106 / 288 Fitting period • Most actuarial studies base the projections on data relating to the years 1950 to present. • There are several justifications for that: - The quality of mortality data, particularly at older ages, is questionable for the pre-1950 period. - Infectious disease was an uncommon cause of death by 1950, while heart disease and cancer were the two most common causes, as they are today. • Routinely using the post-1950 period for mortality forecasting is nevertheless dangerous because of a visible break in mortality during the 1970s. • Procedures for selecting an optimal calibration period identify the longest period for which the estimated κt is linear. 107 / 288 0.94 0.95 0.96 R2 0.97 0.98 0.99 R 2 as a function of the fitting period 1950 1960 1970 1980 1990 Starting year 108 / 288 Optimal fitting period • The period on which the estimated κt are “the most linear” starts at tstart = 1979. • Instead of taking all of the available data 1948-2009, we discard here observations for the years 1948-1978. • Here, short-term trends are preferred because past long-term trends are not expected to be relevant to the long-term future (under a simple RWD structure for the time index). • When restricted to the optimal fitting period, the Lee-Carter model explains 95.17% of the total variance. • The presence of a break in mortality statistics, between 1970 and 1980, is well documented for most industrialized countries and taken into account in various ways. 109 / 288 Modelling the time index • Any time series model is candidate for modelling the dynamics of the estimated κt s. • With ARIMA modelling, no sudden large shock is expected to happen. • In case the insurer is exposed to the risk of a brutal increase in the qx (t)s, jumps can be added to the ARIMA model (no brutal decrease is expected). • These jumps can occur according to a compound Binomial process, and account for extreme events (like a flu epidemics, for instance). • As long as κt obeys an ARIMA dynamics, the mortality projection model only accounts for long-term improvements in longevity driven by κt . 110 / 288 Random walk with drift • In the majority of applications, the κt s obey κt = κt−1 + δ + ξt with independent ξt ∼ N or (0, σξ2 ). • The κtmax +k s, k = 1, 2, . . ., are given by κtmax +k = κtmax + kδ + k X ξtmax +j j=1 κ btmax +k = E[κtmax +k |κ] = κtmax + kδ V[κtmax +k |κ] = kσξ2 RF (x, tmax + k) = µx (tmax + k) = exp(βx kδ). µx (tmax ) 111 / 288 Deterministic vs. stochastic time trend • The estimated mortality reduction factor is given by c (x, tmax + k) = exp(βbx k δ). b RF • Note that very similar reduction factors would have been obtained with a model of the form µx (t) = exp(αx + βx (t − t)), once the optimal fitting period has been selected. • The difference centres on the prediction intervals, which become incredibly narrow after having replaced κt with t − t. • Such a model has been used - by some professional actuarial associations to produce projected life tables (the so-called Nolfi approach), and - by several governmental agencies to produce national forecasts. 112 / 288 Official projections by the Belgian Federal Planning Bureau (FPB) • The FPB model specifies qx (t) = exp(αx + βx t) where βx is the rate of decrease of qx (t) over time. • Each age-specific death probability is assumed to decline at its own exponential rate. bx (t)s are first smoothed over t for each fixed x • The empirical q by moving geometric averaging. • The αx s and βx s are estimated by least-squares for x ≤ 89, i.e. minimizing 89 X X bx (t) − αx − βx t ln q 2 . x=0 t≥1970 • Further smoothing and extrapolation to older ages are performed. 113 / 288 75 70 65 e0(t) 80 85 Forecast of e0↑ (t) with offical values 1960 1980 2000 2020 2040 2060 t 114 / 288 Ages 60 and over • Let us now fit the Lee-Carter model to ages 60 and over. • Restricting the age range appears to be useful if the actuary does not need projected µx (t) for all ages. • Think about pricing immediate life annuities, a product generally sold to individuals above 60. ? • The optimal fitting period starts at tstart = 1983 when only ages over 60 are considered. • When restricted to the optimal fitting period, the Lee-Carter model explains 98.05% of the total variance. 115 / 288 −4 −3 −2 alpha −1 0 1 Estimated αx s 60 70 80 90 100 110 120 Age 116 / 288 0.005 0.010 0.015 Beta 0.020 0.025 0.030 Estimated βx s 60 70 80 90 100 110 120 Age 117 / 288 0 −5 −10 Kappa 5 10 Estimated κt s, t = 1983, . . . , 2009 1985 1990 1995 2000 2005 2010 Time 118 / 288 Inspection of residuals • The Lee-Carter model is in essence a regression model with age x and calendar time t entering the model as covariates to b x (t). explain the observed death rates m • Since we work in a regression model, it is important to inspect residuals rxt to detect a possible structure. • If the residuals rxt exhibit some regular pattern, this means that the model is not able to describe all the phenomena appropriately. • In practice, plotting t 7→ rxt at different ages x, or looking at (x, t) 7→ rxt , and discovering no structure in those graphs ensures that the time trends have been correctly captured by the model. 119 / 288 120 Standardized residuals 110 3 2 90 0 80 −1 −2 70 −3 −4 60 Age 100 1 1985 1990 1995 2000 2005 Time 120 / 288 12 14 16 e65(t) 18 20 22 24 ↑ Forecast of e65 (t) with official values 1960 1980 2000 2020 2040 2060 t 121 / 288 Prediction intervals • Point predictions are obtained by replacing the κt0 +k ’s with their mathematical expectations (κt0 + kδ under the RWD) so that point prediction correspond to median death rates. • It is often impossible to derive prediction intervals analytically because - two very different sources of uncertainty have to be combined: sampling errors in the parameters of the model and forecast errors in the projected time index - the measures of interest are complicated non-linear functions of the parameters αx , βx , and κt and the ARIMA parameters. • Bootstrap procedures help to overcome these problems. 122 / 288 Parametric bootstrap • Parametric bootstrap consists in generating αxb , βxb , and κbt , b = 1, . . . , B, from the appropriate multivariate Normal distribution. • The bth sample, b = 1, . . . , B, in the Monte Carlo simulation is obtained by the following 4 steps: 1. Generate αxb , βxb , and κbt from the appropriate multivariate Normal distribution. 2. Estimate the ARIMA model using the κbt as data points. 3. Generate a projection of κbt ’s using these ARIMA parameters. 4. Compute the quantities of interest on the basis of the αxb , βxb , and κbt . 123 / 288 Semiparametric bootstrap • Starting from the observations (ETRxt , dxt ), we create bootstrap samples b (ETRxt , dxt ), b = 1, . . . , B, b ’s are realizations from the Poisson distribution where the dxt with mean ETRxt µ bx (t). • For each bootstrap sample, the αx ’s, βx ’s and κt ’s are estimated and the κt ’s are then projected on the basis of the reestimated ARIMA model. • This yields B realizations αxb , βxb , κbt and projected κbt on the basis of which we compute the quantity of interest. 124 / 288 Residuals bootstrap • Another possibility is to bootstrap from the residuals of the fitted model. • Specifically, we create the matrix R of residuals rxt . • If the model is appropriate then these residuals are approximately independent and identically distributed and, hence, exchangeable. • Then, we generate B replications Rb , b = 1, . . . , B, by sampling with replacement the elements of the matrix R. • The inverse formula for the residuals is then used to obtain the corresponding matrix of death rates µ bbx (t), or of death b counts Dxt . 125 / 288 600 400 200 0 Frequency 800 1000 1200 % Bootstrapped values of e65 (2015) 20.0 20.2 20.4 20.6 20.8 21.0 126 / 288 20 21 e65(t) 22 23 % Longevity fan chart for e65 (t) 2010 2015 2020 2025 2030 t 127 / 288 Back testing Opt. Fitting period % var. explained δb σ b2 Observation period 1950-1980 1950-1990 1950-2000 1960-1980 1970-1990 1980-2000 79.18 91.71 98.48 -0.2314 -0.6310 -0.7233 0.7363 0.0946 0.0498 128 / 288 13 14 e65(t) 15 16 17 ↑ Back testing for e65 (t) with 90% prediction intervals 1980 1985 1990 1995 2000 2005 2010 t 129 / 288 21 20 19 e65(t) 22 % Forecast of e65 (t): Lee-Carter versus Oeppen-Vaupel 2010 2015 2020 2025 2030 t 130 / 288 Conclusions • Back testing indicates that the Lee-Carter model seems to be unable to forecast future mortality. • This is generally the case with mortality projection models. • Moreover, the comparison between the forecast obtained from Lee-Carter and from Oeppen-Vaupel models suggests substantial model risk. • Therefore, appropriate actuarial strategies need to be defined to counteract longevity risk. 131 / 288 Outline Observed mortality trends (Source: HMD, www.mortality.org) Stochastic modelling for survival analysis Graduation and smoothing via local regression (Source: Statistics Belgium) Cohort life tables and mortality projection models Model risk Adverse selection and risk classification Credibility for death counts Systematic mortality risk in life insurance First-order life tables Pandemics Managing longevity risk 132 / 288 Model risk • In a life insurance portfolio (or a pension plan), deviations from expected mortality in future years may arise from the following risk sources: 1. the stochastic nature of a given model (the actual number of deaths in each calendar year is a random variable); this is called the “process risk”. 2. uncertainty in the values of the parameters, originating the “parameter risk”. 3. uncertainty in the model underlying what we can observe whence the “model risk” arises. • The longevity risk arises from parameters or model risk. • It is therefore sensible to consider simultaneously different mortality projection models and to combine their results using weighted averages (corresponding to probabilistic mixtures). 133 / 288 Example 1 • Consider 500 annuitants aged 65 subject to piecewize constant force of mortality and ultimate age ω = 105. • Under Scenario 1, µx = 0.005 for x = 65, . . . , 85 0.125 for x = 86, . . . , 105. • Under Scenario 2, µx = 0.009 for x = 65, . . . , 85 0.2 for x = 86, . . . , 105. • The respective weights associated to these two scenarios are 0.6 for Scenario 1 and 0.4 for Scenario 2. 134 / 288 Example 1: computation of a65 with 3% technical interest rate a65 = E[aT1 | ] where aT1 | = ω−65 X I[T1 > j](1.03)−j j=1 = 0.6E[aT1 | |Scenario 1] + 0.4E[aT1 | |Scenario 2] Under scenario 1, exp(−j0.005) for j = 1, . . . , 20 j p65 = exp(−20 × 0.005) exp(−(j − 20)0.125) for j = 21, . . . , 40. Under scenario 2, exp(−j0.009) for j = 1, . . . , 20 p = j 65 exp(−20 × 0.009) exp(−(j − 20)0.2) for j = 21, . . . , 40. 135 / 288 Example 1: computation of a65 with 3% technical interest rate a65 = E[aT1 | ] 0.6E[aT1 | |Scenario 1] + 0.4E[aT1 | |Scenario 2] 20 X = 0.6 exp(−j0.005)(1.03)−j = j=1 + exp(−20 × 0.005)(1.03)−20 20 X exp(−j0.125)(1.03)−j j=1 +0.4 20 X exp(−j0.009)(1.03)−j j=1 + exp(−20 × 0.009)(1.03)−20 20 X exp(−j0.2)(1.03)−j j=1 136 / 288 Example 1 The probability that there remains j annuitants alive at age 85, j = 0, 1, . . . , 500, writes Pr[j annuitants alive at age 85] = 0.6 Pr[j annuitants alive at age 85|Scenario 1] +0.4 Pr[j annuitants alive at age 85|Scenario 2] j 500−j 500 = 0.6 exp(−20 × 0.005) 1 − exp(−20 × 0.005) j j 500−j 500 +0.4 exp(−20 × 0.009) 1 − exp(−20 × 0.009) j 137 / 288 Example 2 • Consider a portfolio comprising 1,000 pure endowment contracts sold to 30-year-old individuals, with unit benefit and 35 years deferred period. • Assume that the survival probability to age 65 is 0.95 with probability 0.2 0.9 with probability 0.75 P = 35 30 0.8 with probability 0.05. • Discounting is made at 3%. • The probability of getting the unit capital is Pr[T1 > 35] = E[35 P30 ] = 0.95 × 0.2 + 0.9 × 0.75 + 0.8 × 0.05 = 0.905. 138 / 288 Example 2 Ij = I[Tj > 35] ∼ Ber (Pr[Tj > 35]), "1000 # X V (1.03)−35 Ii = 1000(1.03)−70 V[I1 ]+999000(1.03)−70 C I1 , I2 i=1 V[I1 ] = Pr[T1 > 35] × Pr[T1 ≤ 35] = 0.905 × 0.095 ii h h i h ii h h C I1 , I2 = E C I1 , I2 35 P30 + C E I1 35 P30 , E I2 35 P30 = 0 + V[35 P30 ] = 0.952 × 0.2 + 0.92 × 0.75 + 0.82 × 0.05 − 0.9052 139 / 288 A formal approach based on model averaging • Let ax0 (t0 |m) be the price of an immediate annuity sold to an x0 -aged individual in calendar year t0 , viewed as a function of the life table applying to the annuitant (represented by means of a vector m of death rates). • The available data set is denoted as D. • To predict the cost of this annuity, we have at our disposal a set of K mortality projection models M1 , M2 , . . . , MK , say, producing different sets m from D. • A priori, each model receives the same weight Pr[Mk ] = 1 for all k. K P • As K k=1 Pr[Mk ] = 1, one of the models considered is the true one. 140 / 288 Model risk • The mean of ax0 (t0 |m) given the available observations is E[ax0 (t0 |m)|D] = K X E[ax0 (t0 |m)|D, Mk ] Pr[Mk |D] k=1 where E[ax0 (t0 |m)|D, Mk ] is the prediction of the annuity price using model k. • Similarly, V[ax0 (t0 |m)|D] = K X E[(ax0 (t0 |m))2 |D, Mk ] Pr[Mk |D] k=1 2 − E[ax0 (t0 |m)|D] . 141 / 288 Model risk • The weights Pr[Mk |D], k = 1, 2, . . . , K , assigned to each model should reflect their appropriateness given the data so that model selection criteria are good candidates in that respect. • Consider information criteria of the form I = −2 ln L + π where L is the likelihood function, evaluated by substituting the maximum likelihood estimates of the parameters and π is a penalty that is a function of the number of parameters p and/or the number of observations n. • Standard penalties are π = 2p (AIC) and π = p ln n (BIC). 142 / 288 Model risk • Let Ik = −2 ln Lk + πk be the value of the information criterion for model Mk . • The comparison of model j with model k can be based on exp(−Ij /2) Lj exp(−πj /2) = . Lk exp(−πk /2) exp(−Ik /2) • If the penalties are equal for the two models, that is, πj = πk , then this is just the ratio of the respective likelihoods, also called Bayes factor. • A plausible choice for defining the weight to be assigned to model k is exp(−Ik /2) Pr[Mk |D] = PK . j=1 exp(−Ij /2) 143 / 288 Outline Observed mortality trends (Source: HMD, www.mortality.org) Stochastic modelling for survival analysis Graduation and smoothing via local regression (Source: Statistics Belgium) Cohort life tables and mortality projection models Model risk Adverse selection and risk classification Credibility for death counts Systematic mortality risk in life insurance First-order life tables Pandemics Managing longevity risk 144 / 288 −12 −8 −10 −8 −6 ln qx ln(qx) −6 −4 −4 −2 −2 Individual policies, survival benefits, crude b qx with CI (grey) and general population (- - -) 0 20 40 60 x 80 100 0 20 40 60 80 100 x 145 / 288 −8 −8 −6 −6 ln qx ln(qx) −4 −4 −2 −2 Group policies, survival benefits, crude b qx with CI (grey) and general population (- - -) 0 20 40 60 x 80 100 20 40 60 80 100 x 146 / 288 Different approaches • Policyholders’ specific mortality can be accounted for by one of the following three approaches: 1. Applying the general population reduction factors to a market/internal period life table. 2. Applying market/internal adverse selection coefficients to the general population projected life tables. 3. Age shifts. • Let us apply these three approaches to males, individual policies with survival benefits. • Actuaries need to be aware that adverse selection - strongly depends on the characteristics of the market. - is subject to change over time as the composition of the insured population is modified. 147 / 288 Adverse selection coefficients • We use the following model: ln bxmarket q = f (x) + x , qxpop where f (·) is an unknown smooth function of age x, and x is an error term, assumed to be N or (0, σ 2 ) distributed. • Specifically, the one-year death probability for the market at age x is obtained by multiplying qxpop by exp(b f (x)), that is, exp(b f (x))qxpop . • Adverse selection coefficients are the correction factors exp(b f (x)). 148 / 288 −1.0 −1 −0.8 1 0 20 40 x 60 80 100 −0.6 −0.4 Adv. Sel. Coeff. (log scale) 0 Crude Adv. Sel. Coeff. (log scale) −0.2 2 0.0 Adverse selection coefficients 40 60 x 80 100 149 / 288 0.4 0.5 −0.8 0.6 0 20 40 60 x 80 100 120 0.7 Adv. Sel. Coeff. −0.4 0.8 Adv. Sel. Coeff. (log scale) −0.6 −0.2 0.9 1.0 0.0 Extrapolation to younger and older ages 0 20 40 60 x 80 100 120 150 / 288 Age shifts • A conventional approach to quantify adverse selection consists in determining optimal age shifts ∆? , also called Rueff’s adjustments. • Actuarial calculations for a policyholder aged x are then based pop on the life table {qx−∆ ? +k , k = 0, 1, . . . , ω − x}. • The age shift ∆ can be determined by maximizing the log-likelihood L(∆) obtained by considering mutually independent pop market Dxt ∼ Poi ETRmarket m (t) . xt x−∆ • The optimal age shift ∆? can also be obtained by minimizing X 2 ↑,pop b O(∆) = ex↑,market (t) − b ex−∆ (t) x,t by a grid search. 151 / 288 −126000 −124000 L −122000 −120000 −118000 Log-likelihood L(∆): ∆? = 7 0 5 10 15 20 Delta 152 / 288 Comparison • Let us now compare the three approaches on the basis of death probabilities or life expectancies. • Henceforth, 1. Crude death probabilities: circles 2. Market life table derived from local regression: continuous line — 3. Market life table obtained by applying adverse selection coefficients to the general population life table: broken line - - 4. Market life table obtained by applying age shift to the general population life table: dotted line · · · 153 / 288 −7 −6 −5 −4 ln qx −3 −2 −1 0 Death probabilities 40 60 80 100 120 x 154 / 288 10 20 30 ex 40 50 60 Remaining life expectancies 30 40 50 60 70 80 90 x 155 / 288 Conclusion • All of the three approaches produce almost identical results, close to observed values. • Therefore, we determine adverse selection coefficients for each type of products: males/females, group/individual, benefits in case of death/survival, etc. • These coefficients can be applied to population qx to account for adverse selection. % is about +1 year for group policies and +6 • The impact on e65 years for individual policies, survival benefits, male policyholders. 156 / 288 Portfolio-specific mortality • In principle, the analysis conducted at market level can be applied to any specific portfolio. • If the exposures are limited, a regression model (such as Binomial or Poisson regression models) can be helpful. • Also, banding (i.e. grouping ages by classes) can be considered. • For smaller portfolios, relational models evaluate the specific mortality with respect to some reference life tables. • Specifically, relational models allow the actuary to connect portfolio mortality to some reference mortality (market or general population). 157 / 288 Relational Models • Consider a reference life table given by a set of µref x ’s. • The relational model is ln µ bx = f ln µref + x with x ∼ N or (0, σ 2 ). x • If necessary, age can also enter the model ln µ bx = f1 (x) + f2 ln µref + x with x ∼ N or (0, σ 2 ). x • For ages x with sufficient exposure, we get b f2 ≈ 0 (only b smoothing) whereas f1 ≈ 0 for ages with too limited exposure. • No explicit expression is postulated for the unknown, smooth functions f , f1 , and f2 . 158 / 288 Exposures are available • If exposures-to-risk are available then we can use Poisson-based regression models as Dx ∼ Poi ETRx exp f (ln µref ) . x • If necessary, we can also use age and resort to the model Dx ∼ Poi ETRx exp f1 (x) + f2 (ln µref x ) . • As before, f1 targets ages for which enough exposure is available whereas f2 allows to borrow strength from the reference life table when exposure is too limited. 159 / 288 1 −1 0 AM fit, ind. ins. market −4 −5 −2 −6 −7 Ind. ins. market −3 −2 2 −1 3 Relational model, males, individual policies, survival benefits −6 −5 −4 −3 Gen. population −2 −1 −6 −5 −4 −3 −2 −1 Gen. population 160 / 288 −4 −5 −6 −7 Ind. ins. market −3 −2 −1 Relational model, males, individual policies, survival benefits −6 −5 −4 −3 −2 −1 161 / 288 Typical values for SMR in France Category Executive managers Middle managers Farmers Craftsmen, shopkeepers & self-employed Employees Workers Out of the labour force SMR 0.6 0.9 0.8 % Impact on e65 ≈ +4 ≈ +1 ≈ +2 0.9 1.0 1.2 1.9 ≈ +1 ≈ -1 to -2 Source: INSEE, computations carried out over males aged 45-64 over calendar years 1982-2001, SMRs with respect to the general population. 162 / 288 Projection of cash flows • Let us consider a portfolio of n immediate life annuities, sold to 65-year-old individuals in year t0 and providing them with a payment of 1 monetary unit at the end of each year provided they are still alive. • The random number of contracts at time k (calendar year t0 + k) is Lk , starting with L0 = n. • The sequence of cash flows is L1 , L2 , . . . • Having generated a sequence of κbt0 +k from the appropriate time series model, we compute the corresponding b b µb65+k (t0 + k|κb ), p65+k (t0 + k|κb ) and q65+k (t0 + k|κb ), b = 1, . . . , B, and we generate future cash flows. 163 / 288 Simulating a realization of T65 (t0 ) • We generate u from the Uniform(0,1) distribution. • If u ≥ exp − µ65 (t0 |κ) then the annuitant dies at age 65 and the insurer does not have to pay anything. • Else, if j Y k=0 j+1 Y exp −µ65+k (t0 +k|κ) ≥ u ≥ exp −µ65+k (t0 +k|κ) k=0 then the annuitant dies between ages 65 + j and 65 + j + 1. • If needed, the exact age at death between 65 + j and 65 + j + 1 can be determined explicitly. 164 / 288 Simulating the number of survivors • Since the portfolio is homogeneous with respect to sums insured, faster simulation procedures are available. • Here, we can simulate the annual number of deaths from the Bin(Lk , q65+k (t0 + k|κ)) distribution (assuming a closed group). • Another possibility is to resort to the Poisson approximation for the Binomial distribution. • In practice, these three approaches provide very similar results. • In the numerical illustrations, we use the Binomial modelling. • For each of the B simulated projected life tables, we simulate the sequence of cash flows L1 , L2 , . . . , Lω−65 starting from L0 = n. 165 / 288 Performances of a life annuity portfolio SMR 0.6 0.9 1 1.2 Group life ins. Ultimate ruin probability ≈100% 12.64% 0.28% ≈0% 29.60% Mean time to ruin (in years) 23.26 30.56 31.61 NA 31.55 i? 4.3% 3.3% NA NA 3.4% Net single premium with FPB-2 and 3%: 13.59e (NB: a17| = 13.17e ) Portfolio of size n = 1, 000 Simulation of portfolio extinction according to B=10,000 life tables Yearly interest rate earned on the reserves: 3% i ? is the interest needed to be earned on the reserves to reduce the ruin probability to 1% 166 / 288 Risk classification • The higher the annuity the stronger the adverse selection, cf. Aujoux & Carbonel (Bulletin français d’Actuariat, 1996) ⇒ it seems worth to include the amount paid by the company in mortality modelling. • A recent study performed on the databasis of public pension records maintained by the Research Data Centre of the German Pension Insurance shows a difference close to 6 years in life expectancy at age 65 between the lowest and highest earning groups. • Many other risk factors disappeared after controlling for earnings. • Let us discuss the case study in Gschlossl et al. (European Actuarial Journal, 2011). 167 / 288 Presentation of the data • We consider the mortality experience of an insurance company operating on the German market. • The portfolio has been observed during a period of five years. • Specifically, we use a sample consisting of male insured lives, having an individual policy. • We consider the attained ages 18 up to and inclusive 85. • We have at our disposal death counts as well as the corresponding (central) exposure-to-risk together with the following set of covariates. 168 / 288 Numerical illustration Covariate Categorisation Description Product type EN TL UL 0 1 2 3 4 5 6 7 8 9 10+ endowment term life unit-linked policy year 1 policy year 2 policy year 3 policy year 4 policy year 5 policy year 6 policy year 7 policy year 8 policy year 9 policy year 10 policy year 11 onwards Curtate duration Marginal Exposure 52% 9% 39% 7% 7% 7% 7% 7% 6% 5% 4% 4% 4% 42% 169 / 288 Numerical illustration Covariate Categorisation Description Underwritten Y N 0 25 37.5 50 75 100 b1 b2 b3 b4 b5 underwriting no underwriting no extra mortality 25% extra mortality 37.5% extra mortality 50% extra mortality 75% extra mortality 100% extra mortality (0; 10,000] (10,000; 50,000] (50,000; 100,000] (100,000; 200,000] (200,000; ∞) Extra mortality Amount insured in EUR Marginal Exposure 96% 4% 95% 2% < 1% 2% < 1% < 1% 43% 50% 6% < 1% < 1% 170 / 288 Standardized mortality ratio (SMR) • Let us start with an exploratory analysis based on SMR. • SMR’s are useful to compare mortality experiences: actual deaths in a particular population are compared with those which would be expected if “standard” age-specific rates applied. • Precisely, the SMR is defined as P SMR = P (x,t)∈D (x,t)∈D Dxt b xstand (t) ETRxt m where D is the set of ages and calendar years under interest. b xstand (t) comes from a market life table built for the • Here, m entire German market. 171 / 288 Descriptive analysis • The global SMRs for all ages are 99.88% for endowment, 107.22% for term life and 98.76% for unit linked insurance respectively. • Global SMRs by Curtate durations (displayed next) vary between 85% and 110%. • For many durations, SMRs are close together so that these levels will presumably be grouped together in the regression analysis, resulting in a limited number of durations. • Further, the plot indicates the presence of a selection effect, i.e. mortality seems to be lower in the first few years after policy inception and tends to increase after about five years. 172 / 288 1.00 0.95 0.90 SMR 1.05 1.10 SMR by Curtate durations 0 2 4 6 8 10 Duration 173 / 288 Descriptive analysis • The SMR for all ages are 99.36% for Underwriting=Yes and 118.05% for Underwriting=No. • This suggests higher mortality in absence of underwriting. • The variable Extra Mortality indicates the extra mortality as a percentage of standard mortality: either 0, 25, 37.5, 50, 75 or 100. • The individual extra mortality ratios are usually computed on the basis of the health questionaire filled by the applicants, by summing specific percentages according to internal guidelines. • Global SMRs (displayed next) reveal higher mortality for levels 37.5 and 100, but the SMRs for the other levels are close to 1. 174 / 288 1.0 1.5 2.0 SMR 2.5 3.0 3.5 4.0 SMR by Extra mortality 0 1 2 3 4 5 Extra mortality 175 / 288 0.70 0.75 0.80 SMR 0.85 0.90 0.95 1.00 SMR by Amount insured 1 2 3 4 5 Amount insured 176 / 288 Poisson regression • Recall that it is equivalent for maximum likelihood statistical inference to work on the basis of the “true” likelihood or on the basis of the Poisson likelihood. • The heterogeneity present in the portfolio can be accounted for by means of a Poisson regression model. • Specifically, we record the number of deaths Di from an exposure eri according to the value of a vector of covariates xi = (1, xi1 , xi2 , xi3 . . . , xik )T including an intercept term and baseline mortality, i = 1, 2, 3, . . . , n. • Covariates are linked to the death rates by ln µi = xT i β. • The vector of unknown regression parameters β gives the effect of the covariates. 177 / 288 Likelihood equations • The unknown parameters β can easily be estimated by Poisson maximum likelihood: Di ∼ Poi eri µi . • The logarithm of the exposure ln eri is treated as a covariate with known regression coefficient fixed to 1 (offset). • If the covariates are categorical and have been coded by means of binary covariates xij then equating to 0 the partial derivative of the log-likelihood with respect to βj gives X X di = eri µi . i|xij =1 i|xij =1 • If an intercept is included then also the total number of deaths is exactly fitted by the model. 178 / 288 Results for the final model The variables in the final model include - x: attained age of the insured life (in years) - a: amount insured band - d: curtate duration of the policy (in years) - m: percentage extra mortality at which the mortality risk was accepted - p: product type - u: underwritten 179 / 288 Results for the final model ln µx,a,d,m,p,u = β0 + β1 ln µbx + β2 I(d∈{0, 1, 2}) + β3 I(d∈{3, 4}) +β4 I(p="TL") + β5 I(m=25) + β6 I(m=37.5) +β7 I(m=100) + β8 I(a=b2) + β9 I(a∈{b3, b4, b5}) +β10 I(u="N") + β11 xI(p="TL") + β12 xI(u="N") +β13 xI(m=25) + β14 xI(a=b2) + β15 I(p="TL"∧m=100) +β16 I(p="TL"∧a=b2) + β17 I(p="TL"∧a∈{b3, +β18 I(u="N"∧a=b2) + β19 I(u="N"∧d∈{0, b4, b5}) 1, 2, 3, 4}) 180 / 288 Results for the final model Regression coefficient β0 β1 β2 β3 β4 β5 β6 β7 β8 β9 β10 Parameter Estimate 0.020 1.001 -0.198 -0.089 -0.509 0.623 0.407 0.290 0.248 -0.183 1.295 Standard Error 0.041 0.008 0.029 0.030 0.153 0.216 0.139 0.072 0.070 0.038 0.156 z-value p-value 0.472 125.184 -6.799 -3.003 -3.319 2.884 2.927 4.036 3.574 -4.799 8.306 0.637 < 2 · 10−16 1.82 · 10−11 0.003 9.00 · 10−4 0.004 0.003 5.43 · 10−5 3.51 · 10−4 1.59 · 10−6 < 2 · 10−16 *** *** *** *** ** ** *** *** *** *** 181 / 288 Results for the final model Regression coefficient β11 β12 β13 β14 β15 β16 β17 β18 β19 Parameter Estimate 0.010 -0.009 -0.010 -0.005 0.407 0.182 0.262 -1.275 -0.460 Standard Error 0.003 0.002 0.004 0.001 0.173 0.060 0.102 0.171 0.104 z-value p-value 3.744 -3.501 -2.656 -4.055 2.349 3.004 2.577 -7.421 -4.420 1.81 · 10−4 < 4.63 · 10−4 0.008 5.01 · 10−5 0.019 0.002 0.009 1.17 · 10−13 9.89 · 10−6 *** *** ** *** * ** ** *** *** 182 / 288 Resulting life tables • As an illustration, let us now consider three risk profiles: a low one, a medium one and a high one. • The low risk profile corresponds to duration less than 2, sum insured in b3-b5. • The medium risk profile corresponds to duration 5+, sum insured in b1, product type TL. • The high risk profile corresponds to duration 5+, no underwriting, extra mortality 100, sum insured in b1. • The corresponding logarithmic mortality curves are displayed next. 183 / 288 Death rates according to risk profile low medium high 20 30 40 50 Age 60 70 80 184 / 288 Outline Observed mortality trends (Source: HMD, www.mortality.org) Stochastic modelling for survival analysis Graduation and smoothing via local regression (Source: Statistics Belgium) Cohort life tables and mortality projection models Model risk Adverse selection and risk classification Credibility for death counts Systematic mortality risk in life insurance First-order life tables Pandemics Managing longevity risk 185 / 288 Death counts • Consider a given life insurance portfolio observed for T consecutive periods of time. • Let Dit be the number of deaths recorded in cell i during period t, i = 1, . . . , n, t = 1, . . . , T . • Let ETRit be the corresponding (central) exposure-to-risk. • By cell, we mean a given combination of risk factors, including gender, age, smoking habit, sum insured, etc. • Let Di• = T X Dit , i = 1, . . . , n, t=1 be the total number of deaths in cell i during the observation period. 186 / 288 Death counts • Let µref i be the reference force of mortality, or best-estimate for cell i (derived from market statistics, say). • The a priori expected number of deaths for cell i in period t is δit = E[Dit ] = ETRit µref i . • If appropriate, the reference force of mortality may depend on ref calendar time (i.e., µref it instead of µi ) to allow for longevity improvements. • To include past mortality experience in the predictive distribution of Di,T +1 , we use standard nonlife credibility models. 187 / 288 Company-specific random effect • Let Θ represent the company relative risk level, with respect to the reference life table. • Specifically, we assume that given Θ = θ, the random variables Dit , t = 1, 2, . . ., are independent and obey the Poi(δit θ) distribution, i.e. Pr[Dit = k|Θ = θ] = exp(−θδit ) (θδit )k , k = 0, 1, 2, . . . , k! with δit = ETRit µref i . • A priori, we assume that E[Θ] = 1 so that the reference life table produces the a priori expected number of deaths: E[Dit ] = E[δit Θ] = δit . 188 / 288 A priori distributions for Θ and Dit • Assume that Θ ∼ Gam(a, a) with probability density function fΘ (θ) = 1 a a−1 a θ exp(−aθ), θ > 0. Γ(a) • Then, for non-negative integer k, θδ k it Pr[Dit = k] = exp − θδit fΘ (θ)dθ k! 0 k a δit a a+k −1 = a a + δit a + δit Z +∞ so that Dit obeys the Negative Binomial distribution. 189 / 288 A posteriori distribution for Θ given Dit = dit , i = 1, . . . , n, t = 1, 2, . . . , T Qn i=1 R +∞ 0 Qn QT i=1 t=1 exp QT t=1 exp − θδit − ξδit dit ! θδit dit ! 1 a a−1 exp(−aθ) Γ(a) a θ dit ! ξδit dit ! 1 a a−1 exp(−aξ)dξ Γ(a) a ξ P P P P a+ ni=1 T t=1 dit −1 exp −θ a + ni=1 T δ θ it t=1 = R Pn PT P P +∞ ξ a+ i=1 t=1 dit −1 dξ exp −ξ a + ni=1 T t=1 δit 0 !! n X T Pn PT X = exp −θ a + δit θa+ i=1 t=1 dit −1 i=1 t=1 a+ Pn i=1 PT Γ a+ t=1 δit Pn i=1 a+Pni=1 PTt=1 dit PT t=1 dit . 190 / 288 A posteriori distribution for Θ • A posteriori, we thus have [Θ|Dit , i = 1, 2, . . . , t = 1, 2, . . . , T ] ∼ Gam (a + D•• , a + δ•• ) . • Hence E[Θ|Dit , i = 1, 2, . . . , t = 1, 2, . . . , T ] = = = a + D•• a + δ•• δ•• D•• a ×1+ × a + δ•• a + δ•• δ•• a δ•• × E[Θ] + a + δ•• a+δ | {z ••} bT. ×Θ =credibility factor 191 / 288 Credibility predictor • Given past mortality experience, the future number of deaths Di,T +1 has Negative Binomial distribution with updated parameters: Pr[Di,T +1 = k|Djt for j = 1, . . . , n and t = 1, . . . , T ] = a + D•• + k − 1 a + D•• δi,T +1 a + δ•• + δi,T +1 k a + δ•• a + δ•• + δi,T +1 a+D•• . • The expected number of deaths in cell i for next year T + 1 is therefore given by E[Di,T +1 |Djt for j = 1, . . . , n and t = 1, . . . , T ] = δi,T +1 a + D•• . a + δ•• • Note that the method still applies if the portfolio has been observed for a single year (T = 1). 192 / 288 Cell-specific random effect • Instead of using a single company-specific relative risk level Θ, we could associate a specific relative risk level Θi to each cell i = 1, . . . , n. • Using a Θi specific to cell i offers more flexibility: the relative risk level is allowed to vary between cells, which is more realistic. • A priori, we assume that the Θi are independent and identically distributed, with E[Θi ] = 1 for all i. • The independence of the random effects does not induce any smoothing effect: each cell is revised based on its own experience. • Some structure can be imposed on the Θi by using a hierarchical credibility model. 193 / 288 AE ratio • The key observation is the actual over expected (AE) ratio defined for cell i and year t as Xit = Dit Dit = . E[Dit ] δit • Thus, cell i is represented by a random sequence (Θi , Xi1 , Xi2 , ...). • The random sequences (Θi , Xi1 , Xi2 , ...), i = 1, 2, . . . , n, are mutually independent. • We retain the conditional Poisson specification for the death counts Dit , so that E[Dit |Θi ] = V[Dit |Θi ] = δit Θi . 194 / 288 Buhlmann-Straub credibility model • Given Θi , the random variables Xit , t = 1, 2, ..., are mutually independent. • Further, E[Xit |Θi ] = µ(Θi ) = Θi Θi σ 2 (Θi ) = V[Xit |Θi ] = wit wit where wit = δit = E[Dit ] measures the “weight” for cell i in year t in the portfolio. • The conditional mean and variance of the AE ratio follow from the conditional Poisson distribution for the death counts which ensures that µ(Θi ) = σ 2 (Θi ) = Θi . 195 / 288 Linear credibility predictor bi,T +1 for Xi,T +1 is of the • The linear credibility predictor X PT form ci0 + t=1 cit Xit . • The coefficients cit minimize the objective function h 2 i Ψ(c) = E Θi − ci0 − ci1 Xi1 − ci2 Xi2 − ... − ciT XiT . • Setting get ∂ ∂ci0 Ψ(c) cit = and ∂ ∂cit Ψ(c), t = 1, 2, ..., T , equal to 0, we σ 2 wi• σ 2 wit and c = 1 − i0 1 + σ 2 wi• 1 + σ 2 wi• where σ 2 = V[Θi ] measures residual heterogeneity. 196 / 288 Linear credibility predictor (Ctd) • The linear credibility predictor is given by bi,T +1 = X 1 + 1 + σ 2 wi• σ 2 wi• 1 + σ2w | {z i•} T 1 X wit Xit . wi• t=1 =credibility factor αi • Behavior of the credibility factor αi : (i) αi → 1 if wi• → ∞; (ii) αi increases if σ 2 increases. • The expected number of deaths for next year T + 1 in cell i is bi,T +1 = δi,T +1 δi,T +1 X 1 + σ 2 Di• . 1 + σ 2 δi• • This is in line with the Gamma assumption for Θi as σ 2 = 1/a in this case. 197 / 288 Example: Di1 and δi1 Risk class 1 Risk class 2 Risk class 3 Male 64 (108.1) 44 (50.9) 54 (72) Female 15 (32.8) 15 (16.1) 9 (8.5) Between brackets: δi1 , i.e. company expected claims under market mortality. 198 / 288 Estimation of σ 2 = V[Θi ] As Di• ∼ MPoi(δi• , Θi ), h i h i V[Di• ] = E V Di• Θi + V E Di• Θi = E δi• Θi + V δi• Θi = n X V[Di• ] i=1 = 2 δi• + σ 2 δi• n n X X 2 2 δi• + σ δi• i=1 σ2 i=1 V[D ] − δ i• i• i=1 Pn = 2 i=1 δi• 2 Pn D − δ − D i• i• i• i=1 Pn ⇒ σ b2 = = 0.1168. 2 i=1 δi• Pn 199 / 288 bi2 AE ratio X • The predicted AE ratios for year 2 b2 Di1 bi2 = 1 + σ X with σ b2 = 0.1168 1+σ b2 δi1 are displayed in the next table: Risk class 1 Risk class 2 Risk class 3 • The corresponding expected Male Female 0.6220 0.5697 0.8840 0.9554 0.7766 1.0293 number of deaths is b i2 = δi2 X bi2 . D • The estimated AE ratios varying quite a lot between cells, cell-specific random effects Θi should be preferred over a single company-specific Θ. 200 / 288 Weighting past experience • Sometimes, data show significant changes over time. • In such a case, the predictive ability of past experience decreases with the lag between the period of risk prediction and the period of occurrence. • Letting the random effects depend on time (i.e., Θit instead of Θi ) accounts for this effect by appropriately discounting past experience. • Of course, fitting such credibility models requires more extensive data. 201 / 288 Outline Observed mortality trends (Source: HMD, www.mortality.org) Stochastic modelling for survival analysis Graduation and smoothing via local regression (Source: Statistics Belgium) Cohort life tables and mortality projection models Model risk Adverse selection and risk classification Credibility for death counts Systematic mortality risk in life insurance First-order life tables Pandemics Managing longevity risk 202 / 288 Conditional independence • Assume that the forces of mortality are piecewize constant, i.e. µx+ξ (t + τ ) = µx (t) for every integer x and t, and ξ, τ ∈ [0, 1). • Consider a group of n annuitants aged x1 , . . . , xn in year t0 . • Their respective remaining lifetimes are T1 , . . . , Tn . • Given a set of factors κ, T1 , . . . , Tn are independent with Pr[Ti > ξ|κ] = ξ pxi (κ). • We assume that the functions κ 7→ ξ pxi (κ) are monotonic in κ so that all the Ti s “move in the same direction” when κ increases. 203 / 288 Example: Lee-Carter If βx ≥ 0 for all x and κt0 +k ≤ κ0t0 +k for all k, Pr[Ti > ξ|κ] = exp − (ξ − bξc) exp(αxi +bξc + βxi +bξc κt0 +bξc ) bξc−1 Y exp − exp(αxi +k + βxi +k κt0 +k ) k=0 ≥ exp − (ξ − bξc) exp(αxi +bξc + βxi +bξc κ0t0 +bξc ) bξc−1 Y exp − exp(αxi +k + βxi +k κ0t0 +k ) k=0 = Pr[T1 > ξ|κ0 ] so that κ 7→ ξ pxi (κ) is non-increasing. 204 / 288 Monotonicity in κ, some consequences • Assume that κ 7→ ξ pxi (κ) is non-increasing. • For any non-decreasing h, if κ ≤ κ0 then Z ω−x E[h(T1 )|κ] = h(0) + Pr[Ti > ξ|κ]dh(ξ) 0 Z ω−x ≥ h(0) + Pr[Ti > ξ|κ0 ]dh(ξ) 0 = E[h(T1 )|κ0 ]. • Similarly, the function κ 7→ E[Ψ(T1 , . . . , Ti )|κ] is non-increasing for any non-decreasing Ψ. 205 / 288 Exchangeable lifetimes • Assume that the n policyholders have the same age x0 = x1 = . . . = xn . • For any t1 , . . . , tn ≥ 0, we have " n # Y Pr[T1 ≤ t1 , . . . , Tn ≤ tn ] = E Pr[Ti ≤ ti |κ] i=1 = E " n Y 1 − ti px0 (κ) # i=1 = Pr[T1 ≤ ti1 , . . . , Tn ≤ tin ] for any permutation (i1 , . . . , in ) of (1, . . . , n). • The Ti ’s are exchangeable random variables, that is, their joint distribution function F is invariant under permutation: F (t1 , . . . , tn ) = F (ti1 , . . . , tin ). 206 / 288 Pure endowments • The insurer pays a capital K provided that the policyholder aged x0 at time t0 is still alive at age x0 + d (at time t0 + d). • Consider an homogeneous portfolio of n such contracts and denote as T1 , T2 , . . . , Tn the remaining lifetimes of these x0 -aged policyholders. • The payout at maturity for the company is Ln = K n X Ii where Ii = I[Ti > d] i=1 with Pr[Ii = 1] = E Pr[Ii = 1|κ] = E d px0 (κ) d−1 Y = E px0 +j (t0 + j|κ) . j=0 207 / 288 Positive dependence h i h i C[Ii , Ij ] = E C[Ii , Ij |κ] + C E[Ii |κ], E[Ij |κ] = 0 + C d px0 (κ), d px0 (κ) = V d px0 (κ) > 0 Pr[Ii = 1, Ij = 1] = Pr[Ii = 1] Pr[Ij = 1] | {z } +V d px0 (κ) Probability under independence Pr[Ii = 1|Ij = 1] = Pr[Ii = 1] | {z } V d px0 (κ) + Pr[Ij = 1] Probability under independence > Pr[Ii = 1] 208 / 288 Value-at-Risk (VaR), Tail-VaR and CTE • Given a risk X and a probability level p ∈ (0, 1), the corresponding VaR is defined as VaR[X ; p] = FX−1 (p) = inf{x ∈ R|FX (x) ≥ p}. • The corresponding Tail-VaR is defined as Z 1 1 VaR[X ; ξ] dξ. TVaR[X ; p] = 1−p p • The Conditional Tail Expectation is defined as h i CTE[X ; p] = E X X > VaR[X ; p] . 209 / 288 VaR and CTE for large portfolios • In large portfolios, the characteristics of the insurer’s payout Ln = K n X I[Ti > d] i=1 are essentially determined by those of d px0 (κ). • Specifically, for n large enough, Ln ≈ nK d px0 (κ) VaR[Ln ; ] ≈ nKFd−1 px 0 (κ) () h CTE[Ln ; ] ≈ nK E d px0 (κ)d px0 (κ) > Fd−1 px 0 i () . (κ) 210 / 288 Example 1 • Assume that K = 1 and d px0 (κ) = p + κ where κ = +∆ with probability 12 −∆ with probability 21 . • Then, " E[Ln ] = E n X # I[Ti > d] i=1 " " n ## X = E E I[Ti > d]κ i=1 = E[n(p + κ)] as given κ, n X I[Ti > d] ∼ Bin(n, p + κ) i=1 = np h i = E Bin n, E[d px0 (κ)] . 211 / 288 Example 1 (Ctd) " " V[Ln ] = E V n X I[Ti > d]κ ## " " n ## X +V E I[Ti > d]κ i=1 i=1 = E[n(p + κ)(1 − (p + κ))] + V[n(p + κ)] n X as given κ, I[Ti > d] ∼ Bin(n, p + κ) i=1 = n p − (p 2 + E[κ2 ]) + n2 V[κ] = np(1 − p) + n∆2 (n − 1) h i = V Bin n, E[d px0 (κ)] + n(n − 1)∆2 . 212 / 288 Example 1 (Ctd) p lim n→+∞ V[Ln ] n = = Pr[Ln ≤ k] = = p n(p − p 2 − ∆2 + n∆2 ) lim n→+∞ n ∆ 6= 0 i 1 h Pr Bin n, p − ∆ ≤ k 2 i 1 h + Pr Bin n, p + ∆ ≤ k 2 k 1X n (p − ∆)j (1 − p + ∆)n−j j 2 j=0 k 1X + 2 j=0 n j (p + ∆)j (1 − p − ∆)n−j . 213 / 288 Example 2 • Given κ, the piecewize constant force of mortality applying to an homogeneous group of 1,000 policyholders aged 80 in calendar year t0 has the form µ80 (t0 |κ) = exp(−2.21 + 0.03κ). • Here, κ = −8.9 + with ∼ N or (0, σ 2 = 0.3). • The prediction of 1 E80 with 3% technical interest rate is 1 E80 = (1.03)−1 1 p80 (t0 ) = (1.03)−1 exp − exp(−2.21 + 0.03E[κ]) = (1.03)−1 exp − exp(−2.21 + 0.03 × (−8.9)) 214 / 288 Example 2 (Ctd) At the portfolio level, S = 1000 X I[Ti > 1] i=1 = total payment of this portfolio FS−1 (99.5%) 1 p80 (t0 |κ) ≈ 1000F1−1 p80 (t0 |κ) (99.5%) = exp − exp(−2.21 + 0.03(−8.9 + ) = g () with g decreasing F1−1 p80 (t0 |κ) (99.5%) = Fg−1 () (99.5%) = g F−1 (1 − 99.5%) = exp − exp(−2.21 + 0.03 √ (−8.9 + 0.3Φ−1 (0.005) 215 / 288 Example 3 • Assume that µx (t|κ) = exp(αx + βx κt ). with κt = κt−1 − 0.5 + t starting from κt0 = −7 and independent −4 with probability 0.25 0 with probability 0.5 t = 4 with probability 0.25 • Given κ, lifetimes are independent. • We consider a portfolio comprising 1000 pure endowments 2 E80 with 3% interest rate and α80 = −2.2, α81 = −1.9, β80 = 0.1, β81 = 0.08. 216 / 288 2 p80 (t0 |κ) = exp − exp − 2.2 + 0.1 × (−7.5 − 4) − exp − 1.9 + 0.08 × (−8 − 4 − 4) with probability exp − exp − 2.2 + 0.1 × (−7.5 − 4) − exp − 1.9 + 0.08 × (−8 − 4 − 0) with probability exp − exp − 2.2 + 0.1 × (−7.5 − 4) − exp − 1.9 + 0.08 × (−8 − 4+ 4) with probability exp − exp − 2.2 + 0.1 × (−7.5) 1 − exp − 1.9 + 0.08 × (−8 − 4) with probability 8 exp − exp − 2.2 + 0.1 × (−7.5) − probability 14 exp − 1.9 + 0.08 × (−8) with exp − exp − 2.2 + 0.1 × (−7.5) − exp − 1.9 + 0.08 × (−8 + 4) with probability 18 1 16 1 8 1 16 217 / 288 2 p80 (t0 |κ) = (Ctd) exp − exp − 2.2 + 0.1 × (−7.5 + 4) − exp − 1.9 + 0.08 × (−8 + 4 − 4) with probability exp − exp − 2.2 + 0.1 × (−7.5 + 4) − probability 18 exp − 1.9 + 0.08 × (−8 + 4) with exp − exp − 2.2 + 0.1 × (−7.5 + 4) − exp − 1.9 + 0.08 × (−8 + 4 + 4) with probability 1 16 1 16 218 / 288 Example 3 (Ctd) 2 p80 (t0 |κ) = 0.796403 0.829700 0.851336 0.854748 0.877037 0.892302 0.896185 0.911783 0.926195 with with with with with with with with with probability probability probability probability probability probability probability probability probability 0.0625 0.125 0.125 0.0625 0.25 0.0625 0.125 0.125 0.0625 219 / 288 Example 3 (Ctd) At the portfolio level, S = 1000 X I[Ti > 2] i=1 VaR[S; 0.9] ≈ 1000 × VaR[2 p80 (t0 |κ); 0.9] VaR[2 p80 (t0 |κ); 0.9] = 0.911783 CTE[S; 0.9] ≈ 1000 × CTE[2 p80 (t0 |κ); 0.9] CTE[2 p80 (t0 |κ); 0.9] = E 2 p80 (t0 |κ)2 p80 (t0 |κ) > 0.911783 = 0.926195 220 / 288 Example 3 (Ctd) • With a 14.5% safety margin, the premium income at time t = 0 is 1000 × E[2 p80 (t0 |κ)] × 10.3−2 × 114.5% = 941.38. • Assuming realized interest rate equal to 3%, the available capital at time t = 2 amounts to 941.38 × 1.032 = 998.71. • The probability that the insurer cannot pay the benefits is Pr[S ≥ 999] = 0.0625 1000 × 0.796403999 (1 − 0.796403) + 0.7964031000 +... +0.0625 1000 × 0.926195999 (1 − 0.926195) + 0.9261951000 . 221 / 288 Comparison with independence • Let T1⊥ , . . . , Tn⊥ be independent lifetimes, each Ti⊥ being distributed as Ti , that is, for all t1 , . . . , tn , Pr[T1⊥ ≤ t1 , . . . , Tn⊥ ≤ tn ] = Pr[T1 ≤ t1 ] . . . Pr[Tn ≤ tn ]. Pbtc • Let at| = k=1 v (0, k) denote an annuity certain, where btc denotes the integer part of t, with the convention that the empty sum is zero. • On average, there is no effect of positive dependence since " n # " n # X X E aT ⊥ | = E aTi | . i=1 i i=1 222 / 288 Comparison with independence • As all the Ti s “move in the same direction” when κ increases, we expect some positive dependence between the lifetimes. • Clearly, t 7→ at| is a non-decreasing function. • Therefore, we also expect some positive dependence between the random variables aT1 | , . . . , aTn | . • We would like to compare " n # " n # X X TVaR aT ⊥ | ; p and TVaR aTi | ; p i=1 i i=1 as well as the stop-loss premiums ! ! n n X X and E . E aTi | − d aT ⊥ | − d i=1 i + i=1 + 223 / 288 Supermodularity • The function φ : R2 → R is supermodular if φ(b1 , b2 ) − φ(a1 , b2 ) − φ(b1 , a2 ) + φ(a1 , a2 ) ≥ 0 for all a1 ≤ b1 , a2 ≤ b2 . • These functions are important since they put more weight - on (b1 , b2 ) and (a1 , a2 ) expressing positive dependence because both components are simultaneously large or small. - than on (a1 , b2 ) and (b1 , a2 ) expressing negative dependence because they both mix one large component with one small component. • If φ has a derivative φ(1,1) = ∂2 ∂x1 ∂x2 φ then φ supermodular ⇔ φ(1,1) ≥ 0 as φ(b , b2 ) − φ(a1 , b2 ) − φ(b1 , a2 ) + φ(a1 , a2 ) = R b1 R b21 (1,1) (x1 , x2 )dx1 dx2 . a1 a 2 φ 224 / 288 Positive Quadrant Dependence (PQD) • Random variables X1 and X2 are positively quadrant dependent (PQD in short) if for all x1 and x2 Pr[X1 > x1 , X2 > x2 ] ≥ Pr[X1 > x1 ] Pr[X2 > x2 ] ⇔ Pr[X1 ≤ x1 , X2 ≤ x2 ] ≥ Pr[X1 ≤ x1 ] Pr[X2 ≤ x2 ]. • Let (X1⊥ , X2⊥ ) be an independent version of (X1 , X2 ), i.e. Pr[X1⊥ ≤ x1 , X2⊥ ≤ x2 ] = Pr[X1 ≤ x1 ] Pr[X2 ≤ x2 ] for all x1 , x2 . • If (X1 , X2 ) is PQD then TVaR[Ψ(X1⊥ , X2⊥ ); p] ≤ TVaR[Ψ(X1 , X2 ); p] for all p for any non-decreasing and supermodular function Ψ. • To establish this result, we need Tchen’s inequality. 225 / 288 Lemma: Tchen’s inequality • Let φ be such that φ(1,1) ≥ 0. • Integration by parts shows that for any (X1 , X2 ) and (Y1 , Y2 ) valued in R+ × R+ with identical marginals, E[φ(Y1 , Y2 )] − E[φ(X1 , X2 )] Z +∞ Z +∞ = FY (x1 , x2 ) − FX (x1 , x2 ) φ(1,1) (x1 , x2 )dx1 dx2 . 0 0 • Therefore, FX (x1 , x2 ) ≤ FY (x1 , x2 ) for all x1 , x2 ⇒ E[φ(Y1 , Y2 )] ≥ E[φ(X1 , X2 )]. • If X is PQD then E[φ(X1 , X2 )] ≥ E[φ(X1⊥ , X2⊥ )]. 226 / 288 Proof that E[(Ψ(X1⊥ , X2⊥ ) − d)+ ] ≤ E[(Ψ(X1 , X2 ) − d)+ ] • For h such that h0 ≥ 0 and h00 ≥ 0, ∂ ∂2 ∂ h Ψ(x1 , x2 ) = h00 Ψ(x1 , x2 ) Ψ(x1 , x2 ) Ψ(x1 , x2 ) ∂x1 ∂x2 ∂x1 ∂x2 ∂2 +h0 Ψ(x1 , x2 ) Ψ(x1 , x2 ) ≥ 0. ∂x1 ∂x2 • Thus, (x1 , x2 ) 7→ h Ψ(x1 , x2 ) is supermodular and h i E h Ψ(X1⊥ , X2⊥ ) ≤ E h Ψ(X1 , X2 ) holds for every h such that h0 ≥ 0 and h00 ≥ 0. • With h(x) = (x − d)+ , we see that h h i i E Ψ(X1⊥ , X2⊥ ) − d + ≤ E Ψ(X1 , X2 ) − d + . 227 / 288 n Lemma: TVaR[Z ; p] = inf a∈R a + C [Z , t] = 1 1−p E[(Z − a)+ ] o E (Z − t)+ + tε d C [Z , t] = −F Z (t) + ε dt t→ 7 C [Z , t] & ⇔ t ≤ VaR[Z ; 1 − ε] t 7→ C [Z , t] % ⇔ t ≥ VaR[Z ; 1 − ε] ⇒ C [Z , t] minimum at t = VaR[Z ; 1 − ε] C Z , VaR[Z ; 1 − ε] = E (Z − VaR[Z ; 1 − ε])+ + VaR[Z ; 1 − ε]ε = εTVaR [Z ; 1 − ε] ⇒ TVaR[Z ; p] = inf a + a∈R 1 E[(Z − a)+ ] 1−p 228 / 288 Proof that TVaR Ψ(X1⊥ , X2⊥ ); p ≤ TVaR [Ψ(X1 , X2 ); p] h i TVaR Ψ(X1⊥ , X2⊥ ); p ≤ VaR[Ψ(X1 , X2 ); p] h i 1 + E Ψ(X1⊥ , X2⊥ ) − VaR[Ψ(X1 , X2 ); p] + 1−p ≤ VaR[Ψ(X1 , X2 ); p] h i 1 + E Ψ(X1 , X2 ) − VaR[Ψ(X1 , X2 ); p] + 1−p = TVaR [Ψ(X1 , X2 ); p] . 229 / 288 Characterizations of PQD (X1 , X2 ) PQD ⇔ Pr[X2 > x2 |X1 > x1 ] ≥ Pr[X2 > x2 ] for all x1 , x2 ⇔ Pr[X1 > x1 |X2 > x2 ] ≥ Pr[X1 > x1 ] for all x1 , x2 ⇔ (h1 (X1 ), h2 (X2 )) PQD for any ↑ h1 and h2 ⇔ C[h1 (X1 ), h2 (X2 )] ≥ 0 for any ↑ h1 and h2 such that the covariance exists ⇔ E[h1 (X1 )h2 (X2 )] ≥ E[h1 (X1 )]E[h2 (X2 )] for any ↑ h1 and h2 such that the expectations exist ⇔ E[h(X1 , X2 )] ≥ E[h(X1⊥ , X2⊥ )] for any supermodular h such that the expectations exist. 230 / 288 Positive cumulative dependence (PCD) • The bivariate notion of PQD has been generalized to higher dimensions in several ways. • The random variables X1 , X2 , . . . , Xn are PCD if for any i = 1, . . . , n − 1 X1 + . . . + Xi and Xi+1 are PQD. • Then, X1 , X2 , . . . , Xn PCD Pn Pn ⊥ i=1 Xi ; p ≤ TVaR [ i=1 Xi ; p] for all p, and TVaR ⇒ h hP i i n E Pn X ⊥ − d ≤ E ( X − d) i i=1 i i=1 + for all d. + 231 / 288 PCD present values aT1 | , . . . , aTn | For any non-decreasing h1 and h2 , i h C h1 aT1 | + . . . + aTi | , h2 (aTi+1 | ) ii h h = E C h1 aT1 | + . . . + aTi | , h2 (aTi+1 | )κ h h i i +C E h1 aT1 | + . . . + aTi | κ , E[h2 (aTi+1 | )|κ] h h i i = 0 + C E h1 aT1 | + . . . + aTi | κ , E[h2 (aTi+1 | )|κ] = C [ψ1 (κ), ψ2 (κ)] with h i ψ1 (κ) = E h1 aT1 | + . . . + aTi | κ and ψ2 (κ) = E[h2 (aTi+1 | )|κ] both non-increasing in κ. 232 / 288 PCD present values aT1 | , . . . , aTn | • Random variables X1 , . . . , Xn are associated if C [Ψ1 (X1 , · · · , Xn ), Ψ2 (X1 , · · · , Xn )] ≥ 0 for all non-decreasing functions Ψ1 and Ψ2 for which the covariances exist. • If κ is associated then aT1 | , . . . , aTn | are PCD. • When κ is multivariate Normal, κ is associated if, and only if, all the elements of the covariance matrix are non-negative, that is, C[κs , κt ] ≥ 0 for all s and t. 233 / 288 Example: Lee-Carter • If βx ≥ 0 for all the ages x and κ obeys an ARIMA model and is such that C[κs , κt ] ≥ 0 for all s and t then aT1 | , . . . , aTn | are PCD. • In particular, if the κt ’s obey a random walk with drift model, then they are associated as for s < t, C[κs , κt ] = C[κs , κs + (t − s)δ + ξs+1 + . . . + ξt ] = V[κs ] ≥ 0. 234 / 288 Comparison with independence • Assume that κ is associated. • Then, aT1 | , . . . , aTn | are PCD, " TVaR n X i=1 # aT ⊥ | ; p ≤ TVaR i " n X # aTi | ; p for all p i=1 and E n X i=1 ! ! n X ≤ E for all d. aT ⊥ | − d aTi | − d i + i=1 + • Treating T1 , . . . , Tn as independent thus amounts to underestimate the risk borne by the annuity provider. 235 / 288 VaR of present value of life annuity payments • However, such relationship for VaRs of Pn there is in general Pno n ⊥ V = i=1 aTi | and V = i=1 aT ⊥ | . i • Since E[V ] = E[V ⊥ ], Z ∞ Pr[V > t] − Pr[V ⊥ > t] dt = 0, 0 so that the graphs of the distribution functions of V and V ⊥ must cross at least once. • Hence, there must exist probability levels p0 and p1 such that VaR[V ; p0 ] < VaR[V ⊥ ; p0 ] and VaR[V ; p1 ] > VaR[V ⊥ ; p1 ]. 236 / 288 Example: Lee-Carter Let us compute the distribution function of V = scenarios in the Lee-Carter framework: Pn i=1 aTi | under 3 (1) lifetimes T1 , . . . , Tn being conditionally independent given κ with common ξ-year conditional survival probability ξ px0 (κ); this corresponds to V . (2) lifetimes T1⊥ , . . . , Tn⊥ being independent with common ξ-year survival probability E[ξ px0 (κ)]; this corresponds to V ⊥ . (3) lifetimes T1⊥ , . . . , Tn⊥ being independent with deterministic death rates mxdet (t0 + k) = exp αx0 +k + βx0 +k E[κt0 +k ] ; 0 +k this is what is generally done in practice. 237 / 288 0.0 0.2 0.4 0.6 0.8 1.0 FV with n = 100, (1) — (2) - - - (3) · · · 1050 1100 1150 1200 1250 238 / 288 0.0 0.2 0.4 0.6 0.8 1.0 FV with n = 1, 000, (1) — (2) - - - (3) · · · 11200 11400 11600 11800 12000 12200 239 / 288 Conclusion • In practice, actuaries often use deterministic projected life tables: arrays of numeric qx (t) indexed by age and calendar time or a reference life table qx? to which cohort-specific age shifts are applied. • This amounts to use scenario 2 or scenario 3. • The preceding graphs show that for sufficiently large portfolios this approach may seriously underestimate the VaR at usual probability level. 240 / 288 Large homogeneous portfolios • For lifetimes conditionally independent and identically distributed given κ, the diversification effect is apparent from "n+1 " n # # X X 1 1 TVaR aTi | ; p ≤ TVaR aTi | ; p . n+1 n i=1 i=1 • In this case, n X 1X lim aTi | = ax0 (t0 |κ) = E[aT1 | |κ] = k px0 (κ)v (0, k) n→+∞ n i=1 k≥1 quantifies the systematic risk. • Moreover, # n 1X TVaR [ax0 (t0 |κ); p] ≤ TVaR aTi | ; p for all n and p. n " i=1 241 / 288 Outline Observed mortality trends (Source: HMD, www.mortality.org) Stochastic modelling for survival analysis Graduation and smoothing via local regression (Source: Statistics Belgium) Cohort life tables and mortality projection models Model risk Adverse selection and risk classification Credibility for death counts Systematic mortality risk in life insurance First-order life tables Pandemics Managing longevity risk 242 / 288 First-order versus second-order basis • By first-order technical basis, actuaries mean a set of conservative assumptions used for pricing and reserving: interest rates, death rates, lapse rates, etc. • Experience basis is called second-order basis by actuaries. • Contrarily to first-order mortality basis, second-order mortality basis consists in the best estimate of future death rates applying to the insured population. • We aim to provide a method to design first-order mortality basis in the context of life annuities. • By appropriately reducing the death rates, we are allowed to make the computations of VaR as if the lifetimes were independent, with a corrected probability level. 243 / 288 Conservative life table • Let us consider the cohort reaching age x0 in year t0 . • For this cohort, we determine the conservative first-order life [1] table mx0 +k , k = 1, 2, . . ., in order to satisfy [1] Pr[mx0 +k (t0 + k|κ) ≤ mx0 +k for some k = 1, 2, . . .] ≤ mort for some probability level mort small enough. • This is equivalent to requiring that [1] Pr[mx0 +k (t0 + k|κ) ≥ mx0 +k for all k] ≥ 1 − mort . [1] • For convenience we express mx0 +k as a percentage π of a set of reference forces of mortality mxref , i.e. 0 +k [1] mx0 +k = πmxref . 0 +k 244 / 288 Example: Lee-Carter • In the Lee-Carter case, we impose [1] Pr[exp(αx0 +k + βx0 +k κt0 +k ) ≤ mx0 +k for some k = 1, 2, . . .] ≤ mort or, equivalently, [1] Pr[exp(αx0 +k + βx0 +k κt0 +k ) ≥ mx0 +k for all k] ≥ 1 − mort . [1] • With mx0 +k = πmxref , the value of π comes from 0 +k " Pr κt0 +k # ln πmxref − αx0 +k 0 +k ≥ for all k = 1 − mort . βx0 +k 245 / 288 Example: Lee-Carter With mxref = exp(αx0 +k + βx0 +k E[κt0 +k ]), 0 +k h Pr exp(αx0 +k + βx0 +k κt0 +k ) ≥ exp(αx0 +k + βx0 +k E[κt0 +k ]) for all k i = Pr[βx0 +k (κt0 +k − E[κt0 +k ]) ≥ ln π for all k] ≥ 1 − mort ⇒ ln π is the mort quantile of the multivariate Normal random vector βx0 +1 κt0 +1 − E[κt0 +1 ] , . . . , βω κt0 +ω−x0 − E[κt0 +ω−x0 ] . 246 / 288 Computation under first-order basis • Under the first-order mortality basis P1 the lifetimes [1] T1 , . . . , Tn are independent with death rates mx0 +k , k = 1, 2, . . . • Under the second-order mortality basis P2 , the lifetimes are conditionally independent given κ and have common death rates mx0 +k (t0 + k|κ). • For any time horizon k and z ≥ 0, we have # " n # " n X X amin{Ti ,k}| ≤ z . P2 amin{Ti ,k}| ≤ z ≥ (1 − mort )P1 i=1 i=1 247 / 288 Proof • Define [1] A = κmx0 +k (t0 + k|κ) ≥ mx0 +k for all k = 1, 2, . . . . • In words, A is the set of all the “safe” trajectories of κt0 +k , i.e. the trajectories such that the future death rates are always above the conservative ones. • By construction, P2 [A] ≥ 1 − mort . • Denote as Ei the mathematical expectation taken under Pi , for i = 1, 2. 248 / 288 Proof P2 " n X # amin{Ti ,k}| ≤ z = P2 " n X i=1 +P2 a min{Ti ,k}| i=1 " n X # ≤ z A P2 [A] amin{Ti ,k}| # ≤ z A P2 [A] i=1 ≥ P2 ≥ P2 " n X i=1 " n X amin{Ti ,k}| # ≤ z A P2 [A] amin{Ti ,k}| # ≤ z A (1 − mort ) i=1 ≥ P1 " n X # amin{Ti ,k}| ≤ z (1 − mort ) i=1 249 / 288 Example: Lee-Carter with RWD If κt = κt−1 + δ + ξt with independent ξt ∼ N or (0, σ 2 ) then, (κt0 +1 , . . . , κt0 +ω−x0 ) is Multivariate Normal with mean vector m = (κt0 + δ, . . . , κt0 +ω−x0 + (ω − x0 )δ)T and variance-covariance matrix 2 σ σ2 · · · σ 2 2σ 2 · · · Σ= . .. .. .. . . 2 2 σ 2σ · · · σ2 2σ 2 .. . (ω − x0 )σ 2 . 250 / 288 Example: Lee-Carter with RWD The value of ln π can then be determined as a quantile of the random vector T βx0 +1 κt0 +1 − (κt0 + δ) , . . . , βω κt0 +ω−x0 − (κt0 + (ω − x0 )δ) that is multivariate Normal with 0 mean matrix σ 2 βx20 +1 σ 2 βx0 +1 βx0 +2 σ 2 βx +1 βx +2 2σ 2 βx20 +2 0 0 e = Σ .. .. . . σ 2 βx0 +1 βω 2σ 2 βx0 +2 βω and variance-covariance ··· ··· .. . σ 2 βx0 +1 βω 2σ 2 βx0 +2 βω .. . ··· (ω − x0 )σ 2 βω2 . 251 / 288 Ruin probabilities • Computing the ruin probability over (0, j) amounts to evaluate the distribution function of Wj = n X amin{Ti ,j}| . i=1 • Under P1 , this can easily be done with Panjer recursion formula in the compound Binomial case whereas under P2 we must account for the positive dependence existing between the lifetimes. • Define for k = 0, 1, . . . [1] qx0 +k [1] = 1 − exp − mx0 +k = 1 − exp − πmxref . 0 +k 252 / 288 Ruin probabilities Under P1 , Wj appears to be a sum of independent random variables amin{T1 ,j}| , . . . , amin{Tn ,j}| where each amin{Ti ,j}| is valued in {0, a1| , . . . , aj| } and has probability distribution [1] P1 [amin{Ti ,j}| = 0] = P1 [Ti < 1] = qx0 P1 [amin{Ti ,j}| = a`| ] = P1 [` ≤ Ti < ` + 1] [1] [1] [1] = px0 . . . px0 +`−1 qx0 +` for ` = 1, . . . , j − 1 P1 [amin{Ti ,j}| = aj| ] = P1 [Ti ≥ j] [1] [1] = px0 . . . px0 +j−1 . 253 / 288 Ruin probabilities • Let Xi be amin{Ti ,j}| that has been appropriately discretized. • Here, we keep the original probability mass at the origin, and round the other values in the support of amin{Ti ,j}| to the least upper integer (after having selected an appropriate monetary unit). • The probability mass function pX (k) = Pr[Xi = k] of the Xi ’s has support {0, 1, . . . , daj| e}, with [1] pX (0) = qx0 > 0. 254 / 288 Ruin probabilities P • The probability mass function of the sum S = ni=1 Xi can be computed from the following recursive formula: s 1 X pS (s) = pX (0) η=1 n+1 η − 1 pX (η)pS (s−η), s = 1, 2, . . . , s n starting from pS (0) = pX (0) . • This recurrence relation is a particular case of Panjer recursion formula in the compound Binomial case. • It is known to be numerically unstable so that particular care is needed when performing the computations. 255 / 288 Initial capital • With an appropriate term structure of interest rate, we determine the initial capital u so that the ultimate ruin probability is at most solv under P1 for a portfolio of n annuitants (with independent lifetimes subject to death rates πmxref ). 0 +k • Specifically, we consider the random variable Wω−x0 , and we determine its 1 − solv quantile, w1−solv , say, under P1 . • The initial capital needed to reach a solvency probability of 1 − solv is then u(n, mort , solv ) = w1−solv . 256 / 288 Trade-off between mort and solv • Let [1] φk (u) = non-ruin probability over (0, k) under first-order basis [2] φk (u) = non-ruin probability over (0, k) under second-order basis. • We then have [2] [1] φk (u) ≥ (1 − mort )φk (u) [2] ⇒ φk (w1−solv ) ≥ (1 − mort )(1 − solv ). • Taking mort = solv = 1% gives a ruin probability of at most 1.99%. 257 / 288 Example: Lee-Carter with RWD • We consider the cohort reaching age x0 = 65 in year t0 = 2005, male Belgian population. • We use the 2005 zero coupon yield curve published by Eurostat. • With mort = 1%, we get π = 93.2%. • This gives a pure life annuity premium amount of 11.63e , to be compared with 11.37e obtained with the projected life table. • The loaded premium is then obtained by dividing the required capital w1−solv by the number of policies. 258 / 288 0.6 0.4 0.2 0.0 Cum. Dist. Function 0.8 1.0 Distribution function of aTi | 0 5 10 15 20 Present values 259 / 288 0.6 0.4 0.2 0.0 Cum. Dist. Function 0.8 1.0 Distribution function of aTi | discretized 0 5 10 15 20 Present values 260 / 288 Distribution of Wω−x0 with 10 policies 0.8 0.6 0.2 0.4 Cum. Dist. Function 0.020 0.015 0.010 0.005 0.0 0.000 Prob. Mass Function 0.025 1.0 w1−solv /10 =15.20e with solv = 1% 0 50 100 Values 150 0 50 100 150 200 Values 261 / 288 Distribution of Wω−x0 with 20 policies 0.8 0.6 0.2 0.4 Cum. Dist. Function 0.010 0.005 0.0 0.000 Prob. Mass Function 0.015 1.0 w1−solv /20 =14.35e with solv = 1% 0 100 200 Values 300 0 100 200 300 400 Values 262 / 288 Distribution of Wω−x0 with 30 policies 0.6 0.2 0.4 Cum. Dist. Function 0.010 0.005 0.0 0.000 Prob. Mass Function 0.8 0.015 1.0 w1−solv /30 =13.93e with solv = 1% 0 100 200 300 Values 400 500 0 100 200 300 400 500 600 Values 263 / 288 Outline Observed mortality trends (Source: HMD, www.mortality.org) Stochastic modelling for survival analysis Graduation and smoothing via local regression (Source: Statistics Belgium) Cohort life tables and mortality projection models Model risk Adverse selection and risk classification Credibility for death counts Systematic mortality risk in life insurance First-order life tables Pandemics Managing longevity risk 264 / 288 The Spanish flu 1918-1919 I The three pandemics of the 20th century (1918-1920, 1957-1958, and 1968-1970) are the main source of empirical evidence on the potential impact of the next pandemic. I The 1918-1920 Spanish flu pandemic caused the highest mortality by far and is often used to set the upper bound on the number of deaths caused by a future pandemic. I Aside from the high mortality rate, 99% of those who died from the disease were under 65. I Murray et al. (Lancet, 2006) developed a statistical model relating annual pandemic mortality to per-head gross domestic product in international dollars and used this model to estimate the effect on mortality of an influenza pandemic in 2004. 265 / 288 Estimation of pandemic mortality I Because influenza pandemics might increase mortality not only in the year of peak incidence, but also in the following year or two, death rates in a 3-year pandemic window are compared with those in surrounding years: mt = all-age all-cause mortality rate in year t PM = pandemic mortality ! P1917 P1923 1920 X m + m t t t=1921 . = mt − t=1915 6 t=1918 I The regression model is as follows: ln PM = β0 + β1 × ln(1918 per-head income). I Note that pandemic mortality should be negatively related to per-head income so that we expect β1 < 0. 266 / 288 Results for Belgium I The regression model is used to deduce PM in calendar year t replacing the 1918 per-head income with the one of that year. I Multiplying the result with the total population gives the estimated number of deaths caused by the emergence of a pandemic influenza in year t. I The total number of deaths is distributed among age groups using the proportions observed during the 1918 pandemic. I These extra deaths can be added to “regular” ones to get pandemic qx . I The analysis has been conducted by Benit and Coulon (2011, UCL Master thesis) for calendar year 2008. 267 / 288 Results for Belgium βb0 = 2.93012 with standard error 1.38411 and p-value 4.7% βb1 = −0.98915 with standard error 0.17616 and p-value < 10−4 b r [βb0 , βb1 ] = −0.9969 2 = 0.6119 N or 2.93012 − 0.98915 ln(per head income), ? 2 1.384112 + ln(per head income) × 0.176162 R ln PM ≈d ? = −2 × 0.9969 × 1.38411 × 0.17616 × ln(per head income) 268 / 288 Deaths due to pandemic over expected deaths (in %) Age 0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 over 70 Total Total 20-65 Males 542.19 597.81 311.88 173.94 224.41 238.47 169.51 68.67 36.15 16.64 8.49 3.44 3.78 3.36 0.95 8.92 25.81 Females 958.94 853.86 563.81 356.26 600.13 465.73 453.33 122.13 51.44 26.88 17.74 10.25 10.49 6.42 1.99 9.16 45.59 269 / 288 Example I Consider a portfolio comprising n one-year term life insurance policies with unit benefit. I The death rate is M = (1 + I )µ with I = 1 if a pandemics occur and 0 otherwise. I Let r = Pr[I = 1] = 1 − Pr[I = 0]. I The number of deaths recorded by the company is D= n X I[Ti ≤ 1] i=1 where T1 , . . . , Tn denote the policyholders’ remaining lifetimes. I Given M, T1 , . . . , Tn are assumed to be independent. 270 / 288 Example (Ctd) = = = = C I[Ti ≤ 1], I[Tj ≤ 1] for i 6= j h h i E C I[Ti ≤ 1], I[Tj ≤ 1]I + C E I[Ti ≤ 1]I , E I[Tj ≤ 1]I h i 0 + C 1 − exp − (1 + I )µx , 1 − exp − (1 + I )µx h i V exp − (1 + I )µx h i h i2 E exp − 2(1 + I )µx − E exp − (1 + I )µx = r exp(−4µx ) + (1 − r ) exp(−2µx ) − r exp(−2µx ) + (1 − r ) exp(−µx ) 2 = r (1 − r ) exp(−2µx ) − exp(−µx ) 2 with maximum when r = 12 . 271 / 288 Example (Ctd) k n−k n (1 − r ) 1 − exp(−µ) exp(−µ) k k n−k n 1 − exp(−2µ) exp(−2µ) +r k Pr[D = k] = E[D] = nE[1 − exp(−M)] = (1 − r )n 1 − exp(−µ) + rn 1 − exp(−2µ) = nV I[T1 ≤ 1] + n(n − 1)C I[T1 ≤ 1], I[T2 ≤ 1] = n Pr[T1 ≤ 1] Pr[T1 > 1] V[D] +n(n − 1)r (1 − r ) exp(−2µx ) − exp(−µx ) D n 2 → 1 − exp(−M) with probability 1. 272 / 288 Mortality catastrophe bonds • Market-traded securities whose payments are linked to a mortality index, similar to catastrophe bonds. • First such bond issued: Swiss Re bond (Vita 1) in December 2003 securitizing Swiss Re’s own exposure to certain catastrophic mortality events (influenzea, major terrorist attack, etc.) • The $ 400 millions principal was at risk if, during any single calendar year, the combined mortality index (weighted by age, sex and nationality US-UK-France-Italy-Switzerland) exceeded 130% of the baseline 2002 level, and would be exhausted if the index exceeded 150%. • The main investors were pension funds to which the bond provided both an attractive return and a good hedge. • Vita 2 by Swiss Re in 2005, Vita 3 by Swiss Re in 2007, Tartan by Scottish Re in 2006, Osiris by AXA in 2006, etc. 273 / 288 Outline Observed mortality trends (Source: HMD, www.mortality.org) Stochastic modelling for survival analysis Graduation and smoothing via local regression (Source: Statistics Belgium) Cohort life tables and mortality projection models Model risk Adverse selection and risk classification Credibility for death counts Systematic mortality risk in life insurance First-order life tables Pandemics Managing longevity risk 274 / 288 Managing longevity risk: Possible strategies • Not selling life annuities anymore (risk avoidance). • Holding enough capital to act as a buffer against longevity risk, accounting for model risk. • Buying reinsurance treaties covering longevity risk, if available and not too expensive. • Entering securitization. • Favoring “natural hedging” inside the insurance company. • Re-thinking product design: - selling only temporary annuities so that biometric bases can be revised regularly; - indexing annuity payments, deferred period, etc. 275 / 288 Life insurance securitization • Life and Longevity Markets Association (LLMA). • q-forward targets qx : agreement between two counterparties to exchange at a future date (the maturity of the contract) an amount equal to the realized mortality rate of a given population (the floating leg) at that future date, in return for a fixed mortality rate (the fixed leg) agreed upon at the inception of the contract. • S-forward targets px : agreement between two counterparties to exchange at a future date (the maturity of the contract) an amount equal to the realized survival rate of a given population (the floating leg) at that future date, in return for a fixed survival rate (the fixed leg) agreed upon at the inception of the contract. • And many more... • Also, Xpect indices of the Deutsche Börse (xpect-index.com). 276 / 288 Life settlement securitization • With life settlements, life policies are sold by their owner for more than the surrender value but less than the face value. • They are then packaged and sold to investors. • Senior life settlement (SLS) securitization began in 2004 to deal with the life policies of elderly high net worth individuals. • Two medical doctors or underwriters independently assess each policyholder’s life expectancy. • The Life Exchange was established in 2005, the Institutional Life Markets Association started in 2007 in New York as the trade body for the life settlements industry. 277 / 288 Securitization • Mortality derivatives are attractive to investors because of (supposed) low correlation with existing assets. • A crucial issue is thus the possible correlation of the index to financial markets. • An index directly related to the demographic structure (like the proportion of the population aged 65 and over, for instance) should be avoided. • An ideal index only reflects the increase in longevity. • A good candidate is the general population period life expectancy at some preset age (the retirement age for instance). • It is transparent to investors and published yearly by national agencies. • However, adverse selection is not accounted for and basis risk remains. 278 / 288 Example: Lee-Carter • The time index κt appears as a natural candidate for being the longevity index in the Lee-Carter framework. • However, κt is not appropriate for that purpose, for (at least) two reasons: (i) First and foremost, it is not transparent to investors and lacks of intuitive meaning. (ii) Second, it is not unique but depends on the identifiability constraint, as well as on re-estimation techniques. • If βx+j ≥ 0 for all j, then ex↑ (t0 + k|κ) = d−1 X 1 X + exp − exp αx+j + βx+j κt0 +k . 2 d≥1 j=0 is a decreasing function of the time index κt0 +k . 279 / 288 Example: Lee-Carter • Denote as Fe ↑ (t x 0 +k|κ) the distribution function of ex↑ (t0 + k|κ). −1 • Recall that if g is decreasing, Fg−1 (X ) () = g (FX (1 − )). • Let Φ be the distribution function of the standard Normal distribution N or (0, 1). • The quantile function of ex↑ (t0 + k|κ) is then given by n o F −1 () = inf s ∈ R | Fe ↑ (t +k|κ) (s) ≥ ↑ ex (t0 +k|κ) x 0 d−1 X p 1 X exp αx+j + βx+j E[κt0 +k ] + V[κt0 +k ]Φ−1 (1 − ) . exp − = + 2 d≥1 j=0 280 / 288 Index-based payoffs • In many countries, governmental agencies perform mortality projections and publish future life expectancies from these official forecasts. • Denote as ex↑,ref (t0 + k) the period life expectancy at age x in calendar year t0 + k, taken from the reference forecast. • We could imagine elementary payoff structures ↑ ↑,ref 0 if e65 (t0 + k|κ) < e65 (t0 + k) = d otherwise. 281 / 288 Index-based payoffs • In some demographic projections, future mortality is assumed to follow either a High, Medium or Low scenario. ↑,high ↑,medium • Let e65 (t0 + k) and e65 (t0 + k) be the life expectancies in the High and Medium scenarios. = • In such a case, the payoff could be ↑ ↑,medium 0 if e65 (t0 + k|κ) < e65 (t0 + k) ↑ ↑,medium (t0 +k|κ)−e65 (t0 +k) d e65 ↑,high ↑,medium e65 (t0 +k)−e65 (t0 +k) ↑,medium ↑ if e65 (t0 + k) < e65 (t0 + ↑ ↑,high d if e65 (t0 + k|κ) > e65 (t0 + k). ↑,high k|κ) < e65 (t0 + k) 282 / 288 Index-based payoffs • More elaborate payoffs could be based on several consecutive annual life expectancies, with a cash stream if they all exceed the corresponding reference forecasts. • Longevity fan charts can be used to visualize the future evolution of the index. • This can help defining a short-term longevity cat bond (if several consecutive life expectancies get out of their prediction bands). • A basket of European national indices could be defined, with payoffs involving life expectancies in several countries (and possibly several reference forecasts). • A payoff depending on the maximum period life expectancy of a group of industrialized countries could also be of interest. 283 / 288 Natural hedging • Exploiting the natural hedge provided by the term insurance portfolio against annuity business seems promising. • Natural hedging uses the interaction of term life insurance and annuities to a change in mortality to stabilize aggregate cash flows. • This offers a competitive advantage to insurers writing both term life insurance and annuities. • Alternatively, this encourages life insurers and annuity writers (e.g., pension funds) to enter mortality swaps. • However, contract terms and relevant ages typically differ. • Moreover, computing the volume of term insurance business needed to counteract longevity risk on the annuity book is subject to model risk (for instance, Lee-Carter favors natural hedging because of the single time index). 284 / 288 Product design • Benefits in case of death and in case of survival can be included in the same policy. • Including death benefits into an annuity contract might be explained by bequest motives. • This amounts to associate counter-monotonic risks in the same contract, which counteracts longevity and mortality risks. • However, the additional death benefit may significantly decrease the annuity incomes and thus may not attract customers. • Indexing premium and/or benefits on mortality dynamics might be a viable alternative. 285 / 288 Indexed life annuity • Let us consider an individual buying an indexed life annuity contract at age x0 in year t0 . • Let pxref (t0 + k) be a forecast for the survival of some 0 +k reference population to which the individual belongs. • The annual payment of 1 due at time k is adjusted by ! k−1 Y pxref+j (t0 + j) 0 . it0 +k = obs (t + j) p 0 x +j 0 j=1 • We can limit the impact of the index on the annuity payments, using it0 +k (imin , imax ) = max{min{it0 +k , imax }, imin } for some imin < 1 < imax , e.g. imin = 0.8 and imax = 1.2. 286 / 288 Advanced-life delayed annuities (ALDA) • The advanced-life delayed annuities (ALDA) are deferred (inflation-adjusted) life annuity contracts. • The deferment period can be seen as a deductible: the policyholder finances his consumption until some advanced age, 80, 85 or even 90, say, and the insurer starts paying the annuity at this age provided the annuitant is still alive. • Hence, the ALDA transforms the consumer choice and asset-allocation problem from a stochastic date of death to a deterministic one in which the terminal horizon becomes the annuity payment commencement date. • The longevity risk involved in the ALDA is quite substantial for the insurance company. 287 / 288 Deferred life annuity with indexed deferment period • Some indexing can be applied to make the pricing more competitive. • If the index is publicly available, the annuitant is able to adjust his consumption level during the deferred period. • Note that we could also think of alternative indexing mechanisms for ALDA. • Considering a deferred life annuity bought at age 65 with payments starting at age 80, say, we could let the starting age vary according to actual longevity improvements. • If longevity increases more than expected, then payments may start at age 82 instead of 80, for instance. 288 / 288