actuarial analysis of longevity risk

advertisement
ACTUARIAL ANALYSIS OF LONGEVITY RISK
Michel DENUIT
michel.denuit@uclouvain.be
Louvain School of Statistics, Biostatistics and Actuarial Science (LSBA)
UCL, Belgium
Academic Year 2013-2014
1 / 288
Observed mortality trends (Source: HMD, www.mortality.org)
Stochastic modelling for survival analysis
Graduation and smoothing via local regression (Source: Statistics
Belgium)
Cohort life tables and mortality projection models
Model risk
Adverse selection and risk classification
Credibility for death counts
Systematic mortality risk in life insurance
First-order life tables
Pandemics
Managing longevity risk
2 / 288
Outline
Observed mortality trends (Source: HMD, www.mortality.org)
Stochastic modelling for survival analysis
Graduation and smoothing via local regression (Source: Statistics
Belgium)
Cohort life tables and mortality projection models
Model risk
Adverse selection and risk classification
Credibility for death counts
Systematic mortality risk in life insurance
First-order life tables
Pandemics
Managing longevity risk
3 / 288
Preamble
Life expectancy
at birth
Early humans
≈ 20 years
Around 1850
≈ 40 years
Around 1950
≈ 60 years
Around 2000
≈ 70 years
4 / 288
Preamble (Ctd)
• The average life span thus roughly tripled over the course of
human history, and much of this increase has happened in the
past 150 years.
• Two trends dominated the mortality decline between 1900
and 2000:
- The first half of the 20th century saw significant improvements
in the mortality of infants and children (and their mothers).
- Since the middle of the 20th century, gains in life expectancy
have been due more to medical factors that have reduced
mortality among older persons (reductions in deaths due to the
“big three” killers – cardiovascular disease, cancer, and
strokes).
5 / 288
Death rates
• Let T be the lifetime, or age at death of an individual from
some population.
• The force of mortality at age x, denoted as µx , is defined by
Pr[x < T ≤ x + ∆|T > x]
.
∆&0
∆
µx = lim
• Henceforth, we assume that the forces of mortality are
piecewise constant, i.e.
µx+ξ = µx for 0 ≤ ξ < 1 and integer x.
• If the true force of mortality varies slowly over the year of age,
the constant force of mortality assumption is reasonable.
• Let x 7→ µx (t) be the forces of mortality for calendar year t.
6 / 288
0
Forces of mortality
−8
−6
ln µx
−4
−2
2000
1950
1900
1850
0
20
40
60
80
100
x
7 / 288
Trend in forces of mortality
−5.0
ln µ40(t)
−5.5
−6.0
−6.0
−6.5
−6.5
−7.0
ln µ20(t)
−5.5
−4.5
−5.0
−4.5
−4.0
Forces of mortality (on the log scale) for Belgian males at ages 20 and 40, period
1841-2009
1850
1900
1950
t
2000
1850
1900
1950
2000
t
8 / 288
Trend in forces of mortality
−2.2
ln µ80(t)
−4.5
−2.6
−2.4
−4.0
ln µ60(t)
−2.0
−3.5
−1.8
−1.6
−3.0
Forces of mortality (on the log scale) for Belgian males at ages 60 and 80, period
1841-2009
1850
1900
1950
t
2000
1850
1900
1950
2000
t
9 / 288
Mortality surface (x, t) 7→ µx (t)
−2
−4
−6
−8
−10
1850
100
1900
80
60
t
1950
40
x
20
2000
0
10 / 288
30
40
50
e0(t)
60
70
e0 = E[T ]
1850
1900
1950
2000
t
11 / 288
10
12
e65(t)
14
16
e65 = E[T − 65|T > 65]
1850
1900
1950
2000
t
12 / 288
20
15
Std Dev.
25
p
V[T ]
1920
1940
1960
1980
2000
t
13 / 288
20
25
30
IQR
35
40
45
Interquartile range FT−1 (0.75) − FT−1 (0.25)
1920
1940
1960
1980
2000
t
14 / 288
5.5
6.0
Std Dev.
6.5
7.0
p
V[T − 65|T > 65]
1920
1940
1960
1980
2000
t
15 / 288
10.0
10.5
IQR
11.0
11.5
12.0
Interquartile range FT−1|T >65 (0.75) − FT−1|T >65 (0.25)
1920
1940
1960
1980
t
2000
16 / 288
Remaining lifetime
• The probability that an individual alive at age x survives to
age x + ξ is denoted as
ξ px
= Pr[T > x + ξ|T > x] = 1 − ξ qx with
ω−x px
=0
for some ultimate age ω such that Pr[T ≤ ω] = 1.
• We have
∂
1 ∂
ln t px
t px = −
p
∂t
∂t
t x
Z t
⇔ t px = exp −
µx+τ dτ .
µx+t = −
0
• The probability density function of T is
d
d
Pr[T ≤ x] =
x q0 = x p0 µx .
dx
dx
17 / 288
0.4
S(x)
0.6
0.8
1.0
Rectangularization of the survival curve x 7→ S(x) = x p0
0.0
0.2
2000
1950
1900
1850
0
20
40
60
80
100
x
18 / 288
0.01
0.02
2000
1950
1900
1850
0.00
f(x)
0.03
0.04
Density x 7→ x p0 µx of T
0
20
40
60
80
100
x
19 / 288
x 7→ ex = E[T − x|T > x]
d
e
dx x
= −1 + µx ex
60
Child mortality hump:
40
20
0
ex
2000
1950
1900
1850
0
20
40
60
80
100
x
20 / 288
Conclusions
• Mortality is on the move.
• Long-term actuarial calculations based on historical life tables
(even the most recent one) are likely to be erroneous.
• The valuation of long-term life insurance liabilities requires life
tables incorporating the expected changes in life duration.
⇒ There is a need for “projected” life tables.
• First of all, we need a biometric model incorporating both
dimensions
- attained age x, and
- calendar time t.
21 / 288
Outline
Observed mortality trends (Source: HMD, www.mortality.org)
Stochastic modelling for survival analysis
Graduation and smoothing via local regression (Source: Statistics
Belgium)
Cohort life tables and mortality projection models
Model risk
Adverse selection and risk classification
Credibility for death counts
Systematic mortality risk in life insurance
First-order life tables
Pandemics
Managing longevity risk
22 / 288
Remaining lifetime
• Let T0 (t) be the time to death, or lifetime of an individual
from some population, born in year t.
• Let Tx (t) be the remaining lifetime of an individual aged x in
calendar year t.
• This individual will so die at age x + Tx (t).
• Denote
ξ px (t)
= Pr[Tx (t) > ξ]
= Pr[T0 (t − x) > x + ξ|T0 (t − x) > x].
• The distribution function of Tx (t) is
ξ qx (t)
= 1 − ξ px (t) = Pr[Tx (t) ≤ ξ].
23 / 288
One-year survival/death probabilities
• The one-year death probability at age x in year t is defined as
qx (t) = Pr[Tx (t) ≤ 1]
= Pr[T0 (t − x) ≤ x + 1|T0 (t − x) > x].
• The one-year survival probability at age x is defined as
px (t) = Pr[Tx (t) > 1]
= Pr[T0 (t − x) > x + 1|T0 (t − x) > x].
• Clearly, for any integer k,
k px (t)
= px (t)px+1 (t + 1) . . . px+k−1 (t + k − 1).
24 / 288
Forces of mortality
• The force of mortality at age x in calendar year t, denoted as
µx (t), is defined by
Pr[x < T0 (t − x) ≤ x + ∆|T0 (t − x) > x]
.
∆→0
∆
µx (t) = lim
• The survival function of Tx (t) is
Z τ
p
(t)
=
exp
−
µ
(t
+
ξ)dξ
.
τ x
x+ξ
0
• The probability density function of Tx (t) is
τ 7→ −
∂
τ px (t) = τ px (t)µx+τ (t + τ ).
∂τ
25 / 288
Period vs. cohort life tables
• The period life table {qx (t), x = 0, 1, . . . , ω} is obtained
using data collected during a given calendar year t (or a few
consecutive years, typically 3 to 5).
• The cohort life table {qx (t + x), x = 0, 1, . . . , ω} follows the
generation born in calendar year t and, thus, incorporates
mortality changes over time.
• Clearly, in a situation where longevity is increasing, we have
qx+k (t + k) < qx+k (t).
• Period life tables thus underestimate liabilities relating to
insurance contracts with benefits in case of survival.
• However, the period life table {qx (t), x = 0, 1, . . . , ω} is
known at time t + 1 whereas the cohort life table
{qx (t + x), x = 0, 1, . . . , ω} remains unknown, except for
q0 (t).
26 / 288
Piecewize constant force of mortality
• Assumption: µx+ξ (t + τ ) = µx (t) for 0 ≤ ξ, τ < 1 and integer
x and t
• for integer age x and calendar year t,


Z 1


px (t) = exp −
µx+ξ (t + ξ) dξ  = exp − µx (t) .
{z
}
0 |
=µx (t)
• Let Lxt be the number of individuals aged x at the beginning
of year t.
• The (central) exposure-to-risk
Z 1
ETRxt =
Lx+ξ,t+ξ dξ
0
measures the time during which these Lxt individuals are
exposed to the risk (of dying) in year t.
27 / 288
Expected exposure-to-risk
Z
E[ETRxt |Lxt = k] = E
0
1
Lx+ξ,t+ξ dξ Lxt = k
1
Z
0
E [Lx+ξ,t+ξ |Lxt = k] dξ
|
{z
}
Z
1
=
=k ξ px (t)
= k
ξ px (t)dξ
0
Z
= k
1
exp − ξµx (t) dξ
0
=
⇒ ETRxt ≈
k
1 − px (t)
µx (t)
−Lxt qx (t)
provided Lxt is large enough.
ln(1 − qx (t))
28 / 288
Cohort life expectancy
ex% (t)
Z
ω−x
= E[Tx (t)] =
0
Z ω−x
=
ξ px (t)dξ
=
=
=
0
ω−x−1
X Z k+1
ξd ξ qx (t)
ξ px (t)dξ
k
k=0
ω−x−1
X
Z
k+1
k px (t)
k=0
ω−x−1
X
k=0
ξ−k px+k (t
+ k)dξ
k
Z
k px (t)
1
ξ px+k (t
+ k)dξ
0
29 / 288
Cohort life expectancy
Now, assuming piecewize constant forces of mortality,
ξ px+k (t
+ k) = exp(−ξµx+k (t + k))
so that
ex% (t) =
1 − exp(−µx (t))
µx (t)


ω−x−1
k−1
X
X
1 − exp(−µx+k (t + k))
.
+
exp −
µx+j (t + j)
µx+k (t + k)
k=1
j=0
30 / 288
Period life expectancy
• Period calculations freeze the value of t and do not follow
cohorts.
• The period life expectancy for calendar year t is
ex↑ (t) =
1 − exp(−µx (t))
µx (t)


ω−x−1
k−1
X
X
1 − exp(−µx+k (t))
+
exp −
µx+j (t)
.
µx+k (t)
k=1
j=0
• Note that the calculation of ex↑ (t) for past t does not require
mortality projections and is therefore objective, not subject to
model risk.
• Period life expectancies at birth e0↑ (t) or at retirement age
↑
e65
(t) are often used as synthetic mortality indicators.
31 / 288
Life annuity premium
• The present value of life annuity payments to a policyholder
aged x in year t is
ω−x
X
bTx (t)c
I[Tx (t) ≥ k]v (0, k) =
k=1
X
v (0, k)
k=1
where
- the discount factor v (s, t) is the value at time s of a unit
payment made at time t, s ≤ t, and
- the indicator I[A] = 1 if event A is realized and 0 otherwise.
• The expected present value of these payments is


bTx (t)c
X
ax (t) = E 
v (0, k)
k=1
=
ω−x−1
X

exp −
k
X

µx+j (t + j) v (0, k + 1).
j=0
k=0
|
{z
=k+1 px (t)=E I[Tx (t)≥k]
}
32 / 288
Outline
Observed mortality trends (Source: HMD, www.mortality.org)
Stochastic modelling for survival analysis
Graduation and smoothing via local regression (Source: Statistics
Belgium)
Cohort life tables and mortality projection models
Model risk
Adverse selection and risk classification
Credibility for death counts
Systematic mortality risk in life insurance
First-order life tables
Pandemics
Managing longevity risk
33 / 288
Mortality surface
−2
Death
prob.
−4
cale)
(log s
−6
−8
1950
1960
100
1970
80
60
e
m
Ti
1980
40
1990
e
Ag
20
2000
0
34 / 288
Estimation of µx
• Given a set of mortality (or sickness, disability, etc.) rates, the
actuary is asked to adjust the observations so that the
graduated values of the series capture the main trend in the
data.
• Consider a given calendar year t ? and suppress the time index,
that is, denote µx = µx (t ? ), etc.
• Assume that we have observed an homogeneous group of Lx
individuals aged x.
• Individual i entered the group at time ai and left it at time bi .
• The exposure-to-risk for individual i, denoted as τi , is the
time spent in the group by individual i, that is,
τi = bi − ai .
35 / 288
Estimation of µx (Ctd)
• To each of these Lx individuals, we associate
1 if individual i dies
δi =
0 otherwise,
i = 1, 2, . . . , Lx .
• We assume that we have at our disposal independent and
identically distributed observations (δi , τi ) for each of the Lx
individuals.
• These individuals are thus subject to the same (constant)
force of mortality µx and we aim to estimate this unknown
parameter by maximum likelihood (ML).
36 / 288
Estimation of µx (Ctd)
• The contribution of individual i to the likelihood writes
- if he survives (δi = 0)
Z
bi −ai px+ai
=
!
bi
exp −
µx+ξ dξ
ai
=
exp − (bi − ai )µx
=
exp(−τi µx );
- if he dies (δi = 1)
bi −ai px+ai µx+bi
= exp(−τi µx )µx .
• Therefore, the contribution of individual i to the likelihood is

 exp(−τi µx ) if δi = 0
δ
exp(−τi µx ) µx i =

exp(−τi µx )µx if δi = 1.
37 / 288
Estimation of µx (Ctd)
• We then have
L µx
=
Lx
Y
exp(−τi µx ) µx
δi
i=1
= exp −µx
Lx
X
!
τi
µx
P Lx
i=1 δi
.
i=1
• Clearly,
PLx
i=1 τi
= ETRx and
L µx
PLx
i=1 δi
= Dx , so that
Dx
.
= exp − ETRx µx µx
• The corresponding log-likelihood is
L µx = −ETRx µx + Dx ln µx .
38 / 288
Estimation of µx (Ctd)
• The ML estimator for µx is then obtained from
Dx
Dx
d
L µx = −ETRx +
=0⇒µ
bx =
dµx
µx
ETRx
d2
Dx
L µx = − 2 < 0.
2
dµx
µx
• Recall that the death rate is defined as
mx =
E[Dx ]
Dx
bx =
⇒m
=µ
bx .
E[ETRx ]
ETRx
• Compare this to
qx =
E[Dx ]
Dx
bx =
⇒q
.
E[Lx ]
Lx
39 / 288
Estimation of µx (Ctd)
• Assume that individual i has his birthday at ci , with
ai < ci < bi .
• Then, this individual is counted for an exposure ci − ai in the
group aged x and for an exposure bi − ci in the group aged
x + 1.
• If δi = 1 then the death is allocated to the group aged x + 1.
40 / 288
Estimation of µx (Ctd)
• Large sample properties of the ML estimators ensure that
V[b
µx ] ≈
d2
− 2 L µx
dµx
• Moreover,
µ
bx ≈d N or
−1
µ
b2
µx , x
Dx
=
µ2x
.
Dx
.
• An approximate (1 − α) CI for µx is given by



µ
bx 


bx ± zα/2 √  .
µ

Dx 
| {z }
error margin
41 / 288
Example
• The following table displays the mortality statistics for
policyholders in a given portfolio observed during calendar
year 2010:
Age
50
51
Exposure-to-risk
3160.25
3094.50
Number of Deaths
62
71
• We assume piecewize constant forces of mortality (i.e.
µx+ξ = µx for every integer x and 0 < ξ < 1),
42 / 288
Example
71
L(µ50 , µ51 ) = exp − 3160.25µ50 µ62
50 exp − 3094.50µ51 µ51
∂
ln L(µ50 , µ51 ) =
∂µ50
∂
∂µ51
∂ − 3160.25µ50 + 62 ln µ50
∂µ50
62
= 0⇒µ
b50 =
3160.25
∂ ln L(µ50 , µ51 ) =
− 3094.50µ51 + 71 ln µ51
∂µ51
71
= 0⇒µ
b51 =
3094.50
43 / 288
Example
b µ50 ]
V[b
=
−
=
µ
b250
∂2
∂µ250
62
=
1
ln L(µ50 , µ51 ) µ50 =b
µ50 ,µ51 =b
µ51
62
2
3160.25
62
⇒ µ
b50 ≈d N or µ50 ,
3160.252
1
b
V[b
µ51 ] = − ∂ 2
ln
L(µ
,
µ
)
50 51 ∂µ2
µ50 =b
µ50 ,µ51 =b
µ51
51
=
µ
b251
71
=
71
2
3094.50
⇒ µ
b51 ≈d N or
µ51 ,
71
3094.502
44 / 288
Example
√
#
62
CI95% (µ50 ) = µ
b50 ± 1.96
3160.25
"
#
√
71
CI95% (µ51 ) = µ
b51 ± 1.96
3094.50
"
\50 )
b
p50 = exp(−µ
= exp(−b
µ50 ) = exp −
62
3160.25
71
3094.50
\51 )
b
p51 = exp(−µ
= exp(−b
µ51 ) = exp −
45 / 288
Example
√
"
CI95% (p50 ) =
exp
exp
"
CI95% (p51 ) =
exp
exp
p50
2d
!
62
−b
µ50 − 1.96
,
3160.25
!#
√
62
−b
µ50 + 1.96
3160.25
!
√
71
−b
µ51 − 1.96
,
3094.50
!#
√
71
−b
µ51 + 1.96
3094.50
= b
p50 b
p51 = exp(−b
µ50 − µ
b51 )
62
71
= exp −
−
3160.25 3094.50
46 / 288
Example
V[b
µ50 + µ
b51 ]
=
µ
b50 + µ
b51 ≈d
CI95% (2 p50 )
"
V[b
µ50 ] + V[b
µ51 ]
under mutually exclusive age classes
62
71
N or µ50 + µ51 ,
+
3160.252 3094.502
=
r
exp −b
µ50 − µ
b51 − 1.96
r
exp −b
µ50 − µ
b51 + 1.96
62
71
+
2
3160.25
3094.502
!
62
71
+
2
3160.25
3094.502
!#
,
47 / 288
x 7→ ln µ
bx
−10
−8
−6
ln µx
−4
−2
0
2009 life table for Belgian males, general population
0
20
40
60
80
100
x
48 / 288
Yearly variations
−10
−8
−6
ln µx
−4
−2
0
2007 (in red), 2008 (in green) and 2009 life tables
0
20
40
60
80
100
x
49 / 288
Local regression model
• Even if the general shape of the mortality curve is stable over
time, year-specific erratic variations remain.
• As long as these erratic variations do not reveal anything
about the underlying mortality pattern, they should be
removed before entering actuarial calculations.
• Graduation is used to remove the random departures from the
underlying smooth curve x 7→ µx .
• In order to smooth the crude estimates µ
bx , we can use
ln µ
bx = f x + x with independent x ∼ N or (0, σ 2 ),
where f is unspecified, but assumed to be smooth.
50 / 288
Local polynomial regression
• Let us consider the model
Yi = f (xi ) + i , i = 1, 2, . . . , n,
where the independent i ∼ N or (0, σ 2 ) represent random
departures from f (·) in the observations, or variability from
sources not included in the xi ’s.
• No strong assumptions are made about f , except that it is a
smooth function.
• Smoothness here means continuity and (higher order)
differentiability.
• Invoking Taylor’s theorem,
- any differentiable function can be approximated locally by a
straight line, and
- any twice differentiable can be approximated locally by a
quadratic polynomial.
51 / 288
Example
• Consider f (x) = 2 + (x − 4)2 and xi =
i+4
5 ,
i = 1, 2, . . . , 31.
• Simulate data from
yi = 2 + (xi − 4)2 + i , i ∼ N or (0, 1), i = 1, 2, . . . , 31.
• The standard way to get a good fit in such a case is to regress
the yi ’s on the xi ’s and on their squares xi2 ’s, i.e. to fit
yi = β0 + β1 xi + β2 xi2 + i with i ∼ N or (0, σ 2 ).
• This is because we can clearly see the quadratic shape of the
data which indicates that we face a linear regression model
with x and x 2 as explanatory variables.
• But how can we proceed if we cannot guess the transformed
explanatory variables from graphing the data?
52 / 288
2
4
6
y
8
10
Simulated data and global linear fit yi = β0 + β1 xi + i
1
2
3
4
5
6
7
x
53 / 288
2
4
6
y
8
10
How to estimate f at x5 ?
1
2
3
4
5
6
7
x
54 / 288
2
4
6
y
8
10
Selection of a neighborhood {x2 , x3 , x4 , x5 , x6 , x7 , x8 }
1
2
3
4
5
6
7
x
55 / 288
2
4
6
y
8
10
Local linear fit at x5 based on {x2 , x3 , x4 , x5 , x6 , x7 , x8 },
giving b
f (x5 ) = βb0 (x5 ) + βb1 (x5 )x5
1
2
3
4
5
6
7
56 / 288
2
4
6
y
8
10
Local linear fit at x15 , giving b
f (x15 ) = βb0 (x15 ) + βb1 (x15 )x15
1
2
3
4
5
6
7
x
57 / 288
2
4
6
y
8
10
Local linear fit at x27 , giving b
f (x27 ) = βb0 (x27 ) + βb1 (x27 )x27
1
2
3
4
5
6
7
x
58 / 288
2
4
6
y
8
10
Results with a neighborhood of size 7
1
2
3
4
5
6
7
x
59 / 288
2
4
6
y
8
10
Size 1 (black), 3 (red), 7 (blue), and 11 (green)
1
2
3
4
5
6
7
x
60 / 288
Key ingredients of the approach
• Simple linear regression model, fitted by least squares,
but. . . locally!
• For each observation, a neighborhood has to be defined:
increasing its size produces smoother fit.
• However, increasing the size of the neighborhood excludes
more observation points near both ends of the data set with
our basic strategy.
• This could be improved by assigning weights to each
observation of the data set.
• These weights depend on the point where the model is fitted.
61 / 288
Local polynomial regression by WLS
• To estimate f at some point x, the observations are weighted
in such a way that
- larger weights are assigned to observations close to x and
- smaller weights to those that are further.
• The λ nearest neighbors of x are gathered in the set V(x).
• The value of λ is often expressed as a proportion α of the
data set (α represents the percentage of the observations
comprised in every smoothing window).
• The fitted value of f (x) is obtained from weighted
least-squares, on the basis of the observations in V(x).
62 / 288
Local polynomial regression by WLS
• Given a weight function K (·), a weight wi (x) is assigned to
(xi , yi ) in V(x) to estimate f (x).
• Zero weights are assigned to observations outside V(x).
• Provided the sample size is large enough, the choice of the
weight function is not too critical and the tricube weight
function appears as a convenient choice.
• Weights are then given by
 3 3

|x−xi |
for (xi , yi ) ∈ V(x)
1 − max
i∈V(x) |x−xi |
wi (x) =

0 otherwise.
63 / 288
Local polynomial regression by WLS
• Within the smoothing window V(x), f is approximated by a
polynomial with coefficients specific to x, that is,
f (t) ≈ β0 (x) + β1 (x)t + β2 (x)t 2 + . . . + βp (x)t p
for t ∈ V(x) (think of Taylor expansion).
• Usually, p ≤ 2 so that only
- local constant regression (p = 0),
- local linear regression (p = 1), and
- local quadratic regression (p = 2)
are considered.
• The βj (x)’s are estimated by WLS, with weights specific to x.
64 / 288
Local linear fit (p = 1)
f (t) ≈ β0 (x) + β1 (x)t for t ∈ V(x)
yi
= β0 (x) + β1 (x)xi + i , i = 1, . . . , n,
with weights wi (x) assigned to (xi , yi ) in V(x)
n
X
2
βb0 (x), βb1 (x) = arg min
wi (x) yi − β0 (x) − β1 (x)xi
i=1
b
f (x) = βb0 (x) + βb1 (x)x
P
Pn
ni=1 wi (x) xi − x w yi
wi (x)yi
i=1
= Pn
+ x − x w Pn
2
i=1 wi (x)
i=1 wi (x) xi − x w
Pn
wi (x)xi
where x w = Pi=1
.
n
i=1 wi (x)
65 / 288
Link with Nadaraya-Watson estimate (p = 0)
• If we approximate locally f by a constant β0 (x)
βb0 (x) = arg min
n
X
wi (x) yi − β0 (x)
2
i=1
Pn
wi (x)yi
b
f (x) = βb0 (x) = Pi=1
n
i=1 wi (x)
= Nadaraya-Watson estimate.
• Many ancient actuarial graduation formulas are of this form.
• For instance, Finlaison suggested in 1829 to smooth the qx by
1
25 q̂x−4 + 2q̂x−3
+ 3q̂x−2 + 4q̂x−1 + 5q̂x + 4q̂x+1 + 3q̂x+2
+2q̂x+3 + q̂x+4 and other similar formulas followed by
Woolhouse in 1866, by Karup in 1899, by Spencer in 1907, by
Gréville in 1967, etc.
• Local linear fit often reduces to Nadaraya-Watson estimate in
the center of the data (when x w = x).
66 / 288
−10
−8
−6
ln µx
−4
−2
0
Data
0
20
40
60
80
100
x
67 / 288
−10
−8
−6
ln µx
−4
−2
Fits with α = 5% (black), 25% (green), and 50% (red)
0
20
40
60
80
100
x
68 / 288
Selecting α by cross-validation
{(x2 , y2 ), (x3 , y3 ), . . . , (xn , yn )} → b
f (−1)
prediction yb1 = b
f (−1) (x1 )
{(x1 , y1 ), (x3 , y3 ), . . . , (xn , yn )} → b
f (−2)
→ prediction yb2 = b
f (−2) (x2 )
→
..
.
..
.
..
.
b
{(x1 , y1 ), (x2 , y2 ), . . . , (xn−1 , yn−1 )} → f (−n)
→
prediction ybn = b
f (−n) (xn )
n
2
1X
b
yi − b
f (−i) (xi )
⇒ CV (f ) =
n
i=1
69 / 288
0.08
GCV
0.06
0.15
0.04
0.10
0.05
GCV
0.20
0.25
0.10
0.30
(G)CV plot
10
20
30
Fitted DF
40
10
15
20
25
30
35
40
Fitted DF
70 / 288
0
Crude death rates and loess fit
−6
ln µx(t)
−4
−2
o
o
o o o
o o o
o
oo
oo
ooo
oo
o
o
oo
oo
oo
o
oo
oo
oo
oo
o
o
ooo
oo
oo
ooo
o
ooo
oo
oo
o
o
ooo
ooo
ooo
o o
o
o
oo o o o
o oooooo o o
o o
o
o
−8
o
o
o
o o
oo
o ooo
o
−10
o
oo
o o
0
20
40
60
80
100
x
71 / 288
0.0
−0.2
−0.4
−0.6
Pearson Residuals
0.2
Standardized residuals
0
20
40
60
80
100
x
72 / 288
Alternative local regression model
bx , we could also use
• In order to smooth the crude estimates q
the model
ln
bx
q
= f (x) + x with independent x ∼ N or (0, σ 2 ),
b
px
where f is left unspecified.
• The x s represent the random departures from f (·) in the
observations.
• No strong assumptions are made about f , except that it is a
smooth function that can approximated locally by a straight
line or by a quadratic polynomial.
• The same approach applies to estimate f , and the same
problem is faced with residuals.
73 / 288
Binomial regression for death counts
• If the initial size Lx of the closed group of individuals aged x
is available, we can also use the model
exp f (x)
Dx ∼ Bin(Lx , qx ) where qx =
1 + exp f (x)
for some smooth, unspecified function f .
• The estimation proceeds as above, substituting the Binomial
likelihood to the Poisson one.
• This approach is suitable for closed homogeneous populations,
but should be applied to open insurance portfolios with great
care (Poisson regression is preferred in this case).
• At the general population level, the Binomial regression model
can often be considered as reasonable.
74 / 288
Dx
1000
500
40000
0
20000
0
Lx
60000
1500
80000
Data
0 3 6 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 102
0 3 6 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 102
Age
Age
75 / 288
600
400
200
LCV
800
1000
(Likelihood) Cross-validation plot
4
6
8
10
12
Fitted DF
76 / 288
−8
−6
ln(qx)
−4
−2
Binomial model, resulting fit
0
20
40
60
80
100
x
77 / 288
1
0
−1
−2
Deviance Residuals
2
3
Binomial model, deviance residuals
0
20
40
60
80
100
x
78 / 288
Poisson regression for death counts
• As explained above, the likelihood to be maximized to derive
the ML estimator of µx is given by
Dx
L µx = exp − ETRx µx µx
.
• The likelihood L µx is proportional to the likelihood based on
Dx ∼ Poisson ETRx µx ,
that is,
L µx
∝ exp − ETRx µx
Dx
ETRx µx
Dx !
.
• Therefore, it is equivalent to work on the basis of the “true”
likelihood or onthe basis of the Poisson likelihood to estimate
µx = exp f (x) .
79 / 288
Closure of the annual life tables
• Data at old ages produce suspect results (because of small
risk exposures).
bx
• According to standard actuarial practice, the series x 7→ q
should be extrapolated to some ultimate age ω before being
used for actuarial computations.
• Several procedures have been proposed by actuaries and
demographers.
• These procedures often produce very different qx at older ages
x buth with relatively little effect on present values of
liabilities.
• Here, we follow a simple technique consisting in closing the
bx ’s by means of a quadratic regression of ln q
bx on age x,
q
bx ) data points with x above some high
fitted to the (x, ln q
threshold.
80 / 288
A simple and powerful ad-hoc method
• Specifically, we use a log-quadratic regression model of the
form
bx = a + bx + cx 2 + ζx with independent ζx ∼ N or (0, σζ2 )
ln q
fitted to oldest ages (the starting age is selected so to
maximize goodness-of-fit).
• The ultimate age ω can be chosen from the data; here, we
take ω = 130.
• We then impose
(i) a closure constraint q130 = 1 ⇔ ω = 130
0
(ii) an inflexion constraint q130
=0
yielding a + bx + cx 2 = c(1302 − 260x + x 2 ).
81 / 288
0.9975
0.9980
R2
0.9985
0.9990
R 2 according to starting age
60
65
70
75
80
85
90
Initial age
82 / 288
−8
−6
ln qx
−4
−2
0
Closed qx
0
20
40
60
80
100
120
x
83 / 288
−8
−6
ln qx
−4
−2
0
Closed qx
0
20
40
60
80
100
120
x
84 / 288
Smoothing the mortality surface
• Instead of smoothing the yearly mortality experience, we could
also smooth the entire mortality surface.
• To this end, we use the local polynomial regression model
ln µ
bx (t) = f (x, t)+x (t) with independent x (t) ∼ N or (0, σ 2 ).
• The smooth f (·, ·) is approximated locally by a quadratic
polynomial in age and time, that is,
f (ξ, τ ) ≈ β0 (x, t) + β1 (x, t)ξ + β2 (x, t)τ
+β3 (x, t)ξ 2 + β4 (x, t)ξτ + β5 (x, t)τ 2
for (ξ, τ ) sufficiently close to (x, t).
• Weights are defined so that observations close to (x, t) receive
the largest weights and weighted least-squares, Poisson or
Binomial ML techniques are used on a local scale.
85 / 288
Unsmoothed mortality surface
−2
Death
prob.
−4
cale)
(log s
−6
−8
1950
1960
100
1970
80
60
e
m
Ti
1980
40
1990
e
Ag
20
2000
0
86 / 288
0.030
0.035
GCV
0.040
0.045
GCV
500
1000
1500
2000
Fitted DF
87 / 288
Smoothed mortality surface
−2
Death
−6
ale)
og sc
(l
prob.
−4
−8
1950
1960
100
1970
80
60
e
m
Ti
1980
40
1990
e
Ag
20
2000
0
88 / 288
Outline
Observed mortality trends (Source: HMD, www.mortality.org)
Stochastic modelling for survival analysis
Graduation and smoothing via local regression (Source: Statistics
Belgium)
Cohort life tables and mortality projection models
Model risk
Adverse selection and risk classification
Credibility for death counts
Systematic mortality risk in life insurance
First-order life tables
Pandemics
Managing longevity risk
89 / 288
Data structure
bx (t) or µ
• Data structure 1: only q
bx (t) are available for a set of
ages x and calendar years t = tmin , . . . , tmax .
bx (t) used for official mortality
• In Belgium for instance, q
projections given for t = 1948, 1949, . . . and

 0, . . . , 100 for t = 1948, . . . , 1993
0, . . . , 101 for t = 1994, . . . , 1998
x=

0, . . . , 105 for t = 1999, . . .
• The numerator and denominator of these demographic
indicators may be available separately:
- Data structure 2: Dxt and Lxt are available,
- Data structure 3: Dxt and ETRxt are available.
90 / 288
Links
Under piecewize constant force of mortality,
b x (t) =
µ
bx (t) = m
Dxt
ETRxt
Dxt
Lxt
bx (t)
µ
bx (t) = − ln 1 − q
bx (t) =
q
bx (t) = 1 − exp − µ
q
bx (t)
bx (t)
Lxt q
ETRxt ≈ −
bx (t))
ln (1 − q
provided Lxt is large enough.
91 / 288
Classical parametric approach
• Classically, some parametric model (e.g. Makeham or
Heligman-Pollard) is fitted to each calendar year data.
• Under Makeham law, we have
µx (t) = at + bt ctx
for calendar year t.
• Parameters (at , bt , ct ) are estimated on the basis of mortality
statistics gathered in year t, for t = tmin , . . . , tmax .
• Then, {b
at , t = tmin , . . . , tmax }, {b
bt , t = tmin , . . . , tmax } and
{b
ct , t = tmin , . . . , tmax } are treated as independent time
series and extrapolated to the future, providing the actuary
with projected life tables.
92 / 288
Makeham model
µ
bxmin (tmin )
µ
bxmin (tmin + 1)
µ
bxmin +1 (tmin ) µ
bxmin +1 (tmin + 1)
..
..
.
.
µ
bxmax (tmin )
↓
b
atmin
b
btmin
b
ctmin
µ
bxmax (tmin + 1)
↓
b
atmin +1
b
btmin +1
b
ctmin +1
···
···
..
.
···
···
···
···
···
µ
bxmin (tmax ) · · ·
µ
bxmin +1 (tmax ) · · ·
..
.
···
µ
bxmax (tmax ) · · ·
↓
···
b
atmax
→
b
btmax
→
b
ctmax
→
µ
bxmin (tmax + k)
µ
bxmin +1 (tmax + k)
..
.
µ
bxmax (tmax + k)
↑
b
atmax +k
b
btmax +k
b
ctmax +k
93 / 288
Weaknesses
• First and foremost, this approach strongly relies on the
appropriateness of the underlying parametric model.
• For instance, Makeham model generally over-estimates µx (t)
for oldest ages, because of its convex shape.
• Secondly, the time series {b
at , t = tmin , . . . , tmax },
{b
bt , t = tmin , . . . , tmax } and {b
ct , t = tmin , . . . , tmax } are
often strongly dependent.
• Using large sample properties of ML estimators, some
correlations are typically near to 1.
• Multivariate time series models are thus needed for (b
at , b
bt , b
ct ).
• Last but not least, the parameter time series often do not
reveal any easy-to-extrapolate time trends.
94 / 288
Age-Period-Cohort models
ln µx (t) or logitqx (t) = ln qpxx (t)
(t)
= αx static reference life table
p
X
(j) (j)
+
βx κt time trends depending on age
j=1
+γx λt−x cohort effect
with suitable identifiability constraints.
95 / 288
Examples
Clayton-Schifflers model
ln µx (t) = αx + κt + λt−x
Lee-Carter model
ln µx (t) = αx + βx κt
Renshaw-Haberman model
ln µx (t) = αx + βx κt + γx λt−x
Debonneuil model
ln µx (t) = αx + βx κt + λt−x
Cairns-Blake-Dowd model
logit(qx (t)) = κt + κt (x − x)
(+ quadratic effects of age)
Plat model
ln µx (t) = αx + κt + κt (x − x)
(3)
+κt (x − x)+ + λt−x
(1)
(1)
(2)
(2)
+ identifiability constraints
96 / 288
Mortality improvement rates
• Other models target the mortality improvement rates
µx (t)/µx (t − 1).
• Specifically, the specification is
ln µx (t) − ln µx (t − 1) = αx +
p
X
(j) (j)
βx κt + γx λt−x + Ext
j=1
with iid error terms Ext .
• Both approaches are related as under the Lee-Carter model for
instance
ln µx (t)−ln µx (t−1) = αx +βx κt − αx +βx κt−1 = βx (κt −κt−1 )
so that trends are removed.
97 / 288
Lee-Carter model
ln µx (t) = αx + βx κt with
P
µ
bxmin (tmin )
µ
bxmin (tmin + 1)
µ
bxmin +1 (tmin ) µ
bxmin +1 (tmin + 1)
..
..
.
.
µ
bxmax (tmin )
↓
κ
btmin
µ
bxmax (tmin + 1)
↓
κ
btmin +1
x
βx = 1 and
···
···
..
.
···
···
···
P
t
κt = 0
µ
bxmin (tmax ) · · ·
µ
bxmin +1 (tmax ) · · ·
..
.
···
µ
bxmax (tmax ) · · ·
↓
···
κ
btmax
→
µ
bxmin (tmax + k)
µ
bxmin +1 (tmax + k)
..
.
µ
bxmax (tmax + k)
↑
κ
btmax +k
98 / 288
Estimation methods
• Parameters αx , βx and κt can be estimated by least-squares
from the empirical forces of mortality µ
bx (t).
• In case more information is available, more efficient estimation
methods can be used:
- Poisson or Negative Binomial regression in case both ETRxt ’s
and Dxt ’s are available (data structure 3),
- and Binomial regression in case both Lxt ’s and Dxt ’s are
available (data structure 2).
• The choice of the estimation method for the αx ’s, βx ’s and
κt ’s thus depends on the data structure.
• Smoothing by penalized least-squares or penalized maximum
likelihood (or previous smoothing of the mortality surface).
99 / 288
GLM-fit of Lee-Carter model
• It is possible to fit the Lee-Carter model using a statistical
software that can perform GLM analyses based on Poisson or
(Negative) Binomial distributions.
• In the Lee-Carter model, age x and time t are treated as
factors αx , βx , and κt .
• The Lee-Carter model is not a GLM (because of the product
βx κt , making the model not linear in the parameters).
• It is nevertheless possible to fit the Lee-Carter model as a
series of GLM’s: if either βx or κt is known then the
Lee-Carter model is in the GLM framework.
100 / 288
GLM-fit of Lee-Carter model (Ctd)
• Select starting values for the βx ’s (take for instance βx = 1
for all x).
• Updating cycle:
1. given βx , estimate the αx ’s and κt ’s in the appropriate GLM;
2. given κt , estimate the αx ’s and βx ’s in the appropriate GLM;
3. compute the deviance
• Repeat the updating cycle until the deviance stabilize.
P
• P
Revert back to the Lee-Carter constraints x βx = 1 and
t κt = 0 once convergence is attained.
101 / 288
Smoothed closed mortality surface
−2
Death
prob.
−4
ca
(log s
−6
le)
−8
1950
1960
100
1970
e
m
Ti
1980
50
1990
e
Ag
2000
0
102 / 288
−4
−6
−8
alpha
−2
0
Estimated αx
0
20
40
60
80
100
120
Age
103 / 288
0.015
0.010
0.005
0.000
Beta
0.020
0.025
Estimated βx
0
20
40
60
80
100
120
Age
104 / 288
−60
−40
−20
Kappa
0
20
40
Estimated κt
1950
1960
1970
1980
1990
2000
2010
Time
105 / 288
Correcting the OLS κt s
The OLS κ
bt ’s are adjusted (taking α
bx and βbx estimates as given)
by imposing either
xmax
X
Dxt =
x=xmin
xmax
X
b
ETRxt exp(b
αx + βbx κ
bt ).
x=xmin
or
b
ex↑? (t) =
b
1 − exp(− exp(b
αx ? + βbx ? κ
bt ))
b
bt )
exp(b
αx ? + βbx ? κ


k−1
X
X
b
+
exp −
exp(b
αx ? +j + βbx ? +j κ
bt )
k≥1
j=0
b
1 − exp(− exp(b
αx ? +k + βbx ? +k κ
bt ))
b
b
exp(b
αx ? +k + βx ? +k κ
bt )
106 / 288
Fitting period
• Most actuarial studies base the projections on data relating to
the years 1950 to present.
• There are several justifications for that:
- The quality of mortality data, particularly at older ages, is
questionable for the pre-1950 period.
- Infectious disease was an uncommon cause of death by 1950,
while heart disease and cancer were the two most common
causes, as they are today.
• Routinely using the post-1950 period for mortality forecasting
is nevertheless dangerous because of a visible break in
mortality during the 1970s.
• Procedures for selecting an optimal calibration period identify
the longest period for which the estimated κt is linear.
107 / 288
0.94
0.95
0.96
R2
0.97
0.98
0.99
R 2 as a function of the fitting period
1950
1960
1970
1980
1990
Starting year
108 / 288
Optimal fitting period
• The period on which the estimated κt are “the most linear”
starts at tstart = 1979.
• Instead of taking all of the available data 1948-2009, we
discard here observations for the years 1948-1978.
• Here, short-term trends are preferred because past long-term
trends are not expected to be relevant to the long-term future
(under a simple RWD structure for the time index).
• When restricted to the optimal fitting period, the Lee-Carter
model explains 95.17% of the total variance.
• The presence of a break in mortality statistics, between 1970
and 1980, is well documented for most industrialized countries
and taken into account in various ways.
109 / 288
Modelling the time index
• Any time series model is candidate for modelling the dynamics
of the estimated κt s.
• With ARIMA modelling, no sudden large shock is expected to
happen.
• In case the insurer is exposed to the risk of a brutal increase in
the qx (t)s, jumps can be added to the ARIMA model (no
brutal decrease is expected).
• These jumps can occur according to a compound Binomial
process, and account for extreme events (like a flu epidemics,
for instance).
• As long as κt obeys an ARIMA dynamics, the mortality
projection model only accounts for long-term improvements in
longevity driven by κt .
110 / 288
Random walk with drift
• In the majority of applications, the κt s obey
κt = κt−1 + δ + ξt with independent ξt ∼ N or (0, σξ2 ).
• The κtmax +k s, k = 1, 2, . . ., are given by
κtmax +k
= κtmax + kδ +
k
X
ξtmax +j
j=1
κ
btmax +k
= E[κtmax +k |κ]
= κtmax + kδ
V[κtmax +k |κ] = kσξ2
RF (x, tmax + k) =
µx (tmax + k)
= exp(βx kδ).
µx (tmax )
111 / 288
Deterministic vs. stochastic time trend
• The estimated mortality reduction factor is given by
c (x, tmax + k) = exp(βbx k δ).
b
RF
• Note that very similar reduction factors would have been
obtained with a model of the form
µx (t) = exp(αx + βx (t − t)),
once the optimal fitting period has been selected.
• The difference centres on the prediction intervals, which
become incredibly narrow after having replaced κt with t − t.
• Such a model has been used
- by some professional actuarial associations to produce
projected life tables (the so-called Nolfi approach), and
- by several governmental agencies to produce national forecasts.
112 / 288
Official projections by the Belgian Federal Planning Bureau
(FPB)
• The FPB model specifies qx (t) = exp(αx + βx t) where βx is
the rate of decrease of qx (t) over time.
• Each age-specific death probability is assumed to decline at its
own exponential rate.
bx (t)s are first smoothed over t for each fixed x
• The empirical q
by moving geometric averaging.
• The αx s and βx s are estimated by least-squares for x ≤ 89,
i.e. minimizing
89 X X
bx (t) − αx − βx t
ln q
2
.
x=0 t≥1970
• Further smoothing and extrapolation to older ages are
performed.
113 / 288
75
70
65
e0(t)
80
85
Forecast of e0↑ (t) with offical values
1960
1980
2000
2020
2040
2060
t
114 / 288
Ages 60 and over
• Let us now fit the Lee-Carter model to ages 60 and over.
• Restricting the age range appears to be useful if the actuary
does not need projected µx (t) for all ages.
• Think about pricing immediate life annuities, a product
generally sold to individuals above 60.
?
• The optimal fitting period starts at tstart
= 1983 when only
ages over 60 are considered.
• When restricted to the optimal fitting period, the Lee-Carter
model explains 98.05% of the total variance.
115 / 288
−4
−3
−2
alpha
−1
0
1
Estimated αx s
60
70
80
90
100
110
120
Age
116 / 288
0.005
0.010
0.015
Beta
0.020
0.025
0.030
Estimated βx s
60
70
80
90
100
110
120
Age
117 / 288
0
−5
−10
Kappa
5
10
Estimated κt s, t = 1983, . . . , 2009
1985
1990
1995
2000
2005
2010
Time
118 / 288
Inspection of residuals
• The Lee-Carter model is in essence a regression model with
age x and calendar time t entering the model as covariates to
b x (t).
explain the observed death rates m
• Since we work in a regression model, it is important to inspect
residuals rxt to detect a possible structure.
• If the residuals rxt exhibit some regular pattern, this means
that the model is not able to describe all the phenomena
appropriately.
• In practice, plotting t 7→ rxt at different ages x, or looking at
(x, t) 7→ rxt , and discovering no structure in those graphs
ensures that the time trends have been correctly captured by
the model.
119 / 288
120
Standardized residuals
110
3
2
90
0
80
−1
−2
70
−3
−4
60
Age
100
1
1985
1990
1995
2000
2005
Time
120 / 288
12
14
16
e65(t)
18
20
22
24
↑
Forecast of e65
(t) with official values
1960
1980
2000
2020
2040
2060
t
121 / 288
Prediction intervals
• Point predictions are obtained by replacing the κt0 +k ’s with
their mathematical expectations (κt0 + kδ under the RWD) so
that point prediction correspond to median death rates.
• It is often impossible to derive prediction intervals analytically
because
- two very different sources of uncertainty have to be combined:
sampling errors in the parameters of the model and
forecast errors in the projected time index
- the measures of interest are complicated non-linear functions
of the parameters αx , βx , and κt and the ARIMA parameters.
• Bootstrap procedures help to overcome these problems.
122 / 288
Parametric bootstrap
• Parametric bootstrap consists in generating αxb , βxb , and κbt ,
b = 1, . . . , B, from the appropriate multivariate Normal
distribution.
• The bth sample, b = 1, . . . , B, in the Monte Carlo simulation
is obtained by the following 4 steps:
1. Generate αxb , βxb , and κbt from the appropriate multivariate
Normal distribution.
2. Estimate the ARIMA model using the κbt as data points.
3. Generate a projection of κbt ’s using these ARIMA parameters.
4. Compute the quantities of interest on the basis of the αxb , βxb ,
and κbt .
123 / 288
Semiparametric bootstrap
• Starting from the observations (ETRxt , dxt ), we create
bootstrap samples
b
(ETRxt , dxt
), b = 1, . . . , B,
b ’s are realizations from the Poisson distribution
where the dxt
with mean ETRxt µ
bx (t).
• For each bootstrap sample, the αx ’s, βx ’s and κt ’s are
estimated and the κt ’s are then projected on the basis of the
reestimated ARIMA model.
• This yields B realizations αxb , βxb , κbt and projected κbt on the
basis of which we compute the quantity of interest.
124 / 288
Residuals bootstrap
• Another possibility is to bootstrap from the residuals of the
fitted model.
• Specifically, we create the matrix R of residuals rxt .
• If the model is appropriate then these residuals are
approximately independent and identically distributed and,
hence, exchangeable.
• Then, we generate B replications Rb , b = 1, . . . , B, by
sampling with replacement the elements of the matrix R.
• The inverse formula for the residuals is then used to obtain
the corresponding matrix of death rates µ
bbx (t), or of death
b
counts Dxt .
125 / 288
600
400
200
0
Frequency
800
1000
1200
%
Bootstrapped values of e65
(2015)
20.0
20.2
20.4
20.6
20.8
21.0
126 / 288
20
21
e65(t)
22
23
%
Longevity fan chart for e65
(t)
2010
2015
2020
2025
2030
t
127 / 288
Back testing
Opt. Fitting period
% var. explained
δb
σ
b2
Observation period
1950-1980 1950-1990 1950-2000
1960-1980 1970-1990 1980-2000
79.18
91.71
98.48
-0.2314
-0.6310
-0.7233
0.7363
0.0946
0.0498
128 / 288
13
14
e65(t)
15
16
17
↑
Back testing for e65
(t) with 90% prediction intervals
1980
1985
1990
1995
2000
2005
2010
t
129 / 288
21
20
19
e65(t)
22
%
Forecast of e65
(t): Lee-Carter versus Oeppen-Vaupel
2010
2015
2020
2025
2030
t
130 / 288
Conclusions
• Back testing indicates that the Lee-Carter model seems to be
unable to forecast future mortality.
• This is generally the case with mortality projection models.
• Moreover, the comparison between the forecast obtained from
Lee-Carter and from Oeppen-Vaupel models suggests
substantial model risk.
• Therefore, appropriate actuarial strategies need to be defined
to counteract longevity risk.
131 / 288
Outline
Observed mortality trends (Source: HMD, www.mortality.org)
Stochastic modelling for survival analysis
Graduation and smoothing via local regression (Source: Statistics
Belgium)
Cohort life tables and mortality projection models
Model risk
Adverse selection and risk classification
Credibility for death counts
Systematic mortality risk in life insurance
First-order life tables
Pandemics
Managing longevity risk
132 / 288
Model risk
• In a life insurance portfolio (or a pension plan), deviations
from expected mortality in future years may arise from the
following risk sources:
1. the stochastic nature of a given model (the actual number of
deaths in each calendar year is a random variable); this is
called the “process risk”.
2. uncertainty in the values of the parameters, originating the
“parameter risk”.
3. uncertainty in the model underlying what we can observe
whence the “model risk” arises.
• The longevity risk arises from parameters or model risk.
• It is therefore sensible to consider simultaneously different
mortality projection models and to combine their results using
weighted averages (corresponding to probabilistic mixtures).
133 / 288
Example 1
• Consider 500 annuitants aged 65 subject to piecewize
constant force of mortality and ultimate age ω = 105.
• Under Scenario 1,
µx =
0.005 for x = 65, . . . , 85
0.125 for x = 86, . . . , 105.
• Under Scenario 2,
µx =
0.009 for x = 65, . . . , 85
0.2 for x = 86, . . . , 105.
• The respective weights associated to these two scenarios are
0.6 for Scenario 1 and 0.4 for Scenario 2.
134 / 288
Example 1: computation of a65 with 3% technical interest
rate
a65
= E[aT1 | ] where aT1 | =
ω−65
X
I[T1 > j](1.03)−j
j=1
=
0.6E[aT1 | |Scenario 1] + 0.4E[aT1 | |Scenario 2]
Under scenario 1,
exp(−j0.005) for j = 1, . . . , 20
j p65 =
exp(−20 × 0.005) exp(−(j − 20)0.125) for j = 21, . . . , 40.
Under scenario 2,
exp(−j0.009) for j = 1, . . . , 20
p
=
j 65
exp(−20 × 0.009) exp(−(j − 20)0.2) for j = 21, . . . , 40.
135 / 288
Example 1: computation of a65 with 3% technical interest
rate
a65
= E[aT1 | ]
0.6E[aT1 | |Scenario 1] + 0.4E[aT1 | |Scenario 2]

20
X
= 0.6 
exp(−j0.005)(1.03)−j
=
j=1
+ exp(−20 × 0.005)(1.03)−20
20
X

exp(−j0.125)(1.03)−j 
j=1

+0.4 
20
X
exp(−j0.009)(1.03)−j
j=1
+ exp(−20 × 0.009)(1.03)−20
20
X

exp(−j0.2)(1.03)−j 
j=1
136 / 288
Example 1
The probability that there remains j annuitants alive at age 85,
j = 0, 1, . . . , 500, writes
Pr[j annuitants alive at age 85]
= 0.6 Pr[j annuitants alive at age 85|Scenario 1]
+0.4 Pr[j annuitants alive at age 85|Scenario 2]
j
500−j
500
= 0.6
exp(−20 × 0.005) 1 − exp(−20 × 0.005)
j
j
500−j
500
+0.4
exp(−20 × 0.009) 1 − exp(−20 × 0.009)
j
137 / 288
Example 2
• Consider a portfolio comprising 1,000 pure endowment
contracts sold to 30-year-old individuals, with unit benefit and
35 years deferred period.
• Assume that the survival probability to age 65 is

 0.95 with probability 0.2
0.9 with probability 0.75
P
=
35 30

0.8 with probability 0.05.
• Discounting is made at 3%.
• The probability of getting the unit capital is
Pr[T1 > 35] = E[35 P30 ]
= 0.95 × 0.2 + 0.9 × 0.75 + 0.8 × 0.05 = 0.905.
138 / 288
Example 2
Ij = I[Tj > 35] ∼ Ber (Pr[Tj > 35]),
"1000
#
X
V
(1.03)−35 Ii = 1000(1.03)−70 V[I1 ]+999000(1.03)−70 C I1 , I2
i=1
V[I1 ] = Pr[T1 > 35] × Pr[T1 ≤ 35] = 0.905 × 0.095
ii
h h i h ii
h h
C I1 , I2 = E C I1 , I2 35 P30 + C E I1 35 P30 , E I2 35 P30
= 0 + V[35 P30 ]
= 0.952 × 0.2 + 0.92 × 0.75 + 0.82 × 0.05 − 0.9052
139 / 288
A formal approach based on model averaging
• Let ax0 (t0 |m) be the price of an immediate annuity sold to an
x0 -aged individual in calendar year t0 , viewed as a function of
the life table applying to the annuitant (represented by means
of a vector m of death rates).
• The available data set is denoted as D.
• To predict the cost of this annuity, we have at our disposal a
set of K mortality projection models M1 , M2 , . . . , MK , say,
producing different sets m from D.
• A priori, each model receives the same weight
Pr[Mk ] =
1
for all k.
K
P
• As K
k=1 Pr[Mk ] = 1, one of the models considered is the
true one.
140 / 288
Model risk
• The mean of ax0 (t0 |m) given the available observations is
E[ax0 (t0 |m)|D] =
K
X
E[ax0 (t0 |m)|D, Mk ] Pr[Mk |D]
k=1
where E[ax0 (t0 |m)|D, Mk ] is the prediction of the annuity
price using model k.
• Similarly,
V[ax0 (t0 |m)|D] =
K
X
E[(ax0 (t0 |m))2 |D, Mk ] Pr[Mk |D]
k=1
2
− E[ax0 (t0 |m)|D] .
141 / 288
Model risk
• The weights Pr[Mk |D], k = 1, 2, . . . , K , assigned to each
model should reflect their appropriateness given the data so
that model selection criteria are good candidates in that
respect.
• Consider information criteria of the form
I = −2 ln L + π
where L is the likelihood function, evaluated by substituting
the maximum likelihood estimates of the parameters and π is
a penalty that is a function of the number of parameters p
and/or the number of observations n.
• Standard penalties are π = 2p (AIC) and π = p ln n (BIC).
142 / 288
Model risk
• Let Ik = −2 ln Lk + πk be the value of the information
criterion for model Mk .
• The comparison of model j with model k can be based on
exp(−Ij /2)
Lj exp(−πj /2)
=
.
Lk exp(−πk /2)
exp(−Ik /2)
• If the penalties are equal for the two models, that is, πj = πk ,
then this is just the ratio of the respective likelihoods, also
called Bayes factor.
• A plausible choice for defining the weight to be assigned to
model k is
exp(−Ik /2)
Pr[Mk |D] = PK
.
j=1 exp(−Ij /2)
143 / 288
Outline
Observed mortality trends (Source: HMD, www.mortality.org)
Stochastic modelling for survival analysis
Graduation and smoothing via local regression (Source: Statistics
Belgium)
Cohort life tables and mortality projection models
Model risk
Adverse selection and risk classification
Credibility for death counts
Systematic mortality risk in life insurance
First-order life tables
Pandemics
Managing longevity risk
144 / 288
−12
−8
−10
−8
−6
ln qx
ln(qx)
−6
−4
−4
−2
−2
Individual policies, survival benefits, crude b
qx with CI
(grey) and general population (- - -)
0
20
40
60
x
80
100
0
20
40
60
80
100
x
145 / 288
−8
−8
−6
−6
ln qx
ln(qx)
−4
−4
−2
−2
Group policies, survival benefits, crude b
qx with CI (grey)
and general population (- - -)
0
20
40
60
x
80
100
20
40
60
80
100
x
146 / 288
Different approaches
• Policyholders’ specific mortality can be accounted for by one
of the following three approaches:
1. Applying the general population reduction factors to a
market/internal period life table.
2. Applying market/internal adverse selection coefficients to the
general population projected life tables.
3. Age shifts.
• Let us apply these three approaches to males, individual
policies with survival benefits.
• Actuaries need to be aware that adverse selection
- strongly depends on the characteristics of the market.
- is subject to change over time as the composition of the
insured population is modified.
147 / 288
Adverse selection coefficients
• We use the following model:
ln
bxmarket
q
= f (x) + x ,
qxpop
where f (·) is an unknown smooth function of age x, and x is
an error term, assumed to be N or (0, σ 2 ) distributed.
• Specifically, the one-year death probability for the market at
age x is obtained by multiplying qxpop by exp(b
f (x)), that is,
exp(b
f (x))qxpop .
• Adverse selection coefficients are the correction factors
exp(b
f (x)).
148 / 288
−1.0
−1
−0.8
1
0
20
40
x
60
80
100
−0.6
−0.4
Adv. Sel. Coeff. (log scale)
0
Crude Adv. Sel. Coeff. (log scale)
−0.2
2
0.0
Adverse selection coefficients
40
60
x
80
100
149 / 288
0.4
0.5
−0.8
0.6
0
20
40
60
x
80
100
120
0.7
Adv. Sel. Coeff.
−0.4
0.8
Adv. Sel. Coeff. (log scale)
−0.6
−0.2
0.9
1.0
0.0
Extrapolation to younger and older ages
0
20
40
60
x
80
100
120
150 / 288
Age shifts
• A conventional approach to quantify adverse selection consists
in determining optimal age shifts ∆? , also called Rueff’s
adjustments.
• Actuarial calculations for a policyholder aged x are then based
pop
on the life table {qx−∆
? +k , k = 0, 1, . . . , ω − x}.
• The age shift ∆ can be determined by maximizing the
log-likelihood L(∆) obtained by considering mutually
independent
pop
market
Dxt
∼ Poi ETRmarket
m
(t)
.
xt
x−∆
• The optimal age shift ∆? can also be obtained by minimizing
X
2
↑,pop
b
O(∆) =
ex↑,market (t) − b
ex−∆
(t)
x,t
by a grid search.
151 / 288
−126000
−124000
L
−122000
−120000
−118000
Log-likelihood L(∆): ∆? = 7
0
5
10
15
20
Delta
152 / 288
Comparison
• Let us now compare the three approaches on the basis of
death probabilities or life expectancies.
• Henceforth,
1. Crude death probabilities: circles
2. Market life table derived from local regression: continuous line
—
3. Market life table obtained by applying adverse selection
coefficients to the general population life table: broken line - - 4. Market life table obtained by applying age shift to the general
population life table: dotted line · · ·
153 / 288
−7
−6
−5
−4
ln qx
−3
−2
−1
0
Death probabilities
40
60
80
100
120
x
154 / 288
10
20
30
ex
40
50
60
Remaining life expectancies
30
40
50
60
70
80
90
x
155 / 288
Conclusion
• All of the three approaches produce almost identical results,
close to observed values.
• Therefore, we determine adverse selection coefficients for each
type of products: males/females, group/individual, benefits in
case of death/survival, etc.
• These coefficients can be applied to population qx to account
for adverse selection.
%
is about +1 year for group policies and +6
• The impact on e65
years for individual policies, survival benefits, male
policyholders.
156 / 288
Portfolio-specific mortality
• In principle, the analysis conducted at market level can be
applied to any specific portfolio.
• If the exposures are limited, a regression model (such as
Binomial or Poisson regression models) can be helpful.
• Also, banding (i.e. grouping ages by classes) can be
considered.
• For smaller portfolios, relational models evaluate the specific
mortality with respect to some reference life tables.
• Specifically, relational models allow the actuary to connect
portfolio mortality to some reference mortality (market or
general population).
157 / 288
Relational Models
• Consider a reference life table given by a set of µref
x ’s.
• The relational model is
ln µ
bx = f ln µref
+ x with x ∼ N or (0, σ 2 ).
x
• If necessary, age can also enter the model
ln µ
bx = f1 (x) + f2 ln µref
+ x with x ∼ N or (0, σ 2 ).
x
• For ages x with sufficient exposure, we get b
f2 ≈ 0 (only
b
smoothing) whereas f1 ≈ 0 for ages with too limited exposure.
• No explicit expression is postulated for the unknown, smooth
functions f , f1 , and f2 .
158 / 288
Exposures are available
• If exposures-to-risk are available then we can use
Poisson-based regression models as
Dx ∼ Poi ETRx exp f (ln µref
)
.
x
• If necessary, we can also use age and resort to the model
Dx ∼ Poi ETRx exp f1 (x) + f2 (ln µref
x ) .
• As before, f1 targets ages for which enough exposure is
available whereas f2 allows to borrow strength from the
reference life table when exposure is too limited.
159 / 288
1
−1
0
AM fit, ind. ins. market
−4
−5
−2
−6
−7
Ind. ins. market
−3
−2
2
−1
3
Relational model, males, individual policies, survival
benefits
−6
−5
−4
−3
Gen. population
−2
−1
−6
−5
−4
−3
−2
−1
Gen. population
160 / 288
−4
−5
−6
−7
Ind. ins. market
−3
−2
−1
Relational model, males, individual policies, survival
benefits
−6
−5
−4
−3
−2
−1
161 / 288
Typical values for SMR in France
Category
Executive managers
Middle managers
Farmers
Craftsmen, shopkeepers
& self-employed
Employees
Workers
Out of the labour force
SMR
0.6
0.9
0.8
%
Impact on e65
≈ +4
≈ +1
≈ +2
0.9
1.0
1.2
1.9
≈ +1
≈ -1 to -2
Source: INSEE, computations carried out over males aged 45-64
over calendar years 1982-2001, SMRs with respect to the general
population.
162 / 288
Projection of cash flows
• Let us consider a portfolio of n immediate life annuities, sold
to 65-year-old individuals in year t0 and providing them with a
payment of 1 monetary unit at the end of each year provided
they are still alive.
• The random number of contracts at time k (calendar year
t0 + k) is Lk , starting with L0 = n.
• The sequence of cash flows is L1 , L2 , . . .
• Having generated a sequence of κbt0 +k from the appropriate
time series model, we compute the corresponding
b
b
µb65+k (t0 + k|κb ), p65+k
(t0 + k|κb ) and q65+k
(t0 + k|κb ),
b = 1, . . . , B, and we generate future cash flows.
163 / 288
Simulating a realization of T65 (t0 )
• We generate u from the Uniform(0,1) distribution.
• If
u ≥ exp − µ65 (t0 |κ)
then the annuitant dies at age 65 and the insurer does not
have to pay anything.
• Else, if
j
Y
k=0
j+1
Y
exp −µ65+k (t0 +k|κ) ≥ u ≥
exp −µ65+k (t0 +k|κ)
k=0
then the annuitant dies between ages 65 + j and 65 + j + 1.
• If needed, the exact age at death between 65 + j and
65 + j + 1 can be determined explicitly.
164 / 288
Simulating the number of survivors
• Since the portfolio is homogeneous with respect to sums
insured, faster simulation procedures are available.
• Here, we can simulate the annual number of deaths from the
Bin(Lk , q65+k (t0 + k|κ)) distribution (assuming a closed
group).
• Another possibility is to resort to the Poisson approximation
for the Binomial distribution.
• In practice, these three approaches provide very similar results.
• In the numerical illustrations, we use the Binomial modelling.
• For each of the B simulated projected life tables, we simulate
the sequence of cash flows L1 , L2 , . . . , Lω−65 starting from
L0 = n.
165 / 288
Performances of a life annuity portfolio
SMR
0.6
0.9
1
1.2
Group life ins.
Ultimate ruin
probability
≈100%
12.64%
0.28%
≈0%
29.60%
Mean time to
ruin (in years)
23.26
30.56
31.61
NA
31.55
i?
4.3%
3.3%
NA
NA
3.4%
Net single premium with FPB-2 and 3%: 13.59e (NB: a17| = 13.17e )
Portfolio of size n = 1, 000
Simulation of portfolio extinction according to B=10,000 life tables
Yearly interest rate earned on the reserves: 3%
i ? is the interest needed to be earned on the reserves to reduce the ruin
probability to 1%
166 / 288
Risk classification
• The higher the annuity the stronger the adverse selection, cf.
Aujoux & Carbonel (Bulletin français d’Actuariat, 1996)
⇒ it seems worth to include the amount paid by the company
in mortality modelling.
• A recent study performed on the databasis of public pension
records maintained by the Research Data Centre of the
German Pension Insurance shows a difference close to 6 years
in life expectancy at age 65 between the lowest and highest
earning groups.
• Many other risk factors disappeared after controlling for
earnings.
• Let us discuss the case study in Gschlossl et al. (European
Actuarial Journal, 2011).
167 / 288
Presentation of the data
• We consider the mortality experience of an insurance company
operating on the German market.
• The portfolio has been observed during a period of five years.
• Specifically, we use a sample consisting of male insured lives,
having an individual policy.
• We consider the attained ages 18 up to and inclusive 85.
• We have at our disposal death counts as well as the
corresponding (central) exposure-to-risk together with the
following set of covariates.
168 / 288
Numerical illustration
Covariate
Categorisation
Description
Product
type
EN
TL
UL
0
1
2
3
4
5
6
7
8
9
10+
endowment
term life
unit-linked
policy year 1
policy year 2
policy year 3
policy year 4
policy year 5
policy year 6
policy year 7
policy year 8
policy year 9
policy year 10
policy year 11 onwards
Curtate
duration
Marginal
Exposure
52%
9%
39%
7%
7%
7%
7%
7%
6%
5%
4%
4%
4%
42%
169 / 288
Numerical illustration
Covariate
Categorisation
Description
Underwritten
Y
N
0
25
37.5
50
75
100
b1
b2
b3
b4
b5
underwriting
no underwriting
no extra mortality
25% extra mortality
37.5% extra mortality
50% extra mortality
75% extra mortality
100% extra mortality
(0; 10,000]
(10,000; 50,000]
(50,000; 100,000]
(100,000; 200,000]
(200,000; ∞)
Extra
mortality
Amount
insured
in EUR
Marginal
Exposure
96%
4%
95%
2%
< 1%
2%
< 1%
< 1%
43%
50%
6%
< 1%
< 1%
170 / 288
Standardized mortality ratio (SMR)
• Let us start with an exploratory analysis based on SMR.
• SMR’s are useful to compare mortality experiences: actual
deaths in a particular population are compared with those
which would be expected if “standard” age-specific rates
applied.
• Precisely, the SMR is defined as
P
SMR = P
(x,t)∈D
(x,t)∈D
Dxt
b xstand (t)
ETRxt m
where D is the set of ages and calendar years under interest.
b xstand (t) comes from a market life table built for the
• Here, m
entire German market.
171 / 288
Descriptive analysis
• The global SMRs for all ages are 99.88% for endowment,
107.22% for term life and 98.76% for unit linked insurance
respectively.
• Global SMRs by Curtate durations (displayed next) vary
between 85% and 110%.
• For many durations, SMRs are close together so that these
levels will presumably be grouped together in the regression
analysis, resulting in a limited number of durations.
• Further, the plot indicates the presence of a selection effect,
i.e. mortality seems to be lower in the first few years after
policy inception and tends to increase after about five years.
172 / 288
1.00
0.95
0.90
SMR
1.05
1.10
SMR by Curtate durations
0
2
4
6
8
10
Duration
173 / 288
Descriptive analysis
• The SMR for all ages are 99.36% for Underwriting=Yes and
118.05% for Underwriting=No.
• This suggests higher mortality in absence of underwriting.
• The variable Extra Mortality indicates the extra mortality as a
percentage of standard mortality: either 0, 25, 37.5, 50, 75 or
100.
• The individual extra mortality ratios are usually computed on
the basis of the health questionaire filled by the applicants, by
summing specific percentages according to internal guidelines.
• Global SMRs (displayed next) reveal higher mortality for levels
37.5 and 100, but the SMRs for the other levels are close to 1.
174 / 288
1.0
1.5
2.0
SMR
2.5
3.0
3.5
4.0
SMR by Extra mortality
0
1
2
3
4
5
Extra mortality
175 / 288
0.70
0.75
0.80
SMR
0.85
0.90
0.95
1.00
SMR by Amount insured
1
2
3
4
5
Amount insured
176 / 288
Poisson regression
• Recall that it is equivalent for maximum likelihood statistical
inference to work on the basis of the “true” likelihood or on
the basis of the Poisson likelihood.
• The heterogeneity present in the portfolio can be accounted
for by means of a Poisson regression model.
• Specifically, we record the number of deaths Di from an
exposure eri according to the value of a vector of covariates
xi = (1, xi1 , xi2 , xi3 . . . , xik )T including an intercept term and
baseline mortality, i = 1, 2, 3, . . . , n.
• Covariates are linked to the death rates by ln µi = xT
i β.
• The vector of unknown regression parameters β gives the
effect of the covariates.
177 / 288
Likelihood equations
• The unknown parameters β can easily be estimated
by
Poisson maximum likelihood: Di ∼ Poi eri µi .
• The logarithm of the exposure ln eri is treated as a covariate
with known regression coefficient fixed to 1 (offset).
• If the covariates are categorical and have been coded by
means of binary covariates xij then equating to 0 the partial
derivative of the log-likelihood with respect to βj gives
X
X
di =
eri µi .
i|xij =1
i|xij =1
• If an intercept is included then also the total number of
deaths is exactly fitted by the model.
178 / 288
Results for the final model
The variables in the final model include
- x: attained age of the insured life (in years)
- a: amount insured band
- d: curtate duration of the policy (in years)
- m: percentage extra mortality at which the mortality risk was
accepted
- p: product type
- u: underwritten
179 / 288
Results for the final model
ln µx,a,d,m,p,u = β0 + β1 ln µbx + β2 I(d∈{0,
1, 2})
+ β3 I(d∈{3,
4})
+β4 I(p="TL") + β5 I(m=25) + β6 I(m=37.5)
+β7 I(m=100) + β8 I(a=b2) + β9 I(a∈{b3,
b4, b5})
+β10 I(u="N") + β11 xI(p="TL") + β12 xI(u="N")
+β13 xI(m=25) + β14 xI(a=b2) + β15 I(p="TL"∧m=100)
+β16 I(p="TL"∧a=b2) + β17 I(p="TL"∧a∈{b3,
+β18 I(u="N"∧a=b2) + β19 I(u="N"∧d∈{0,
b4, b5})
1, 2, 3, 4})
180 / 288
Results for the final model
Regression
coefficient
β0
β1
β2
β3
β4
β5
β6
β7
β8
β9
β10
Parameter
Estimate
0.020
1.001
-0.198
-0.089
-0.509
0.623
0.407
0.290
0.248
-0.183
1.295
Standard
Error
0.041
0.008
0.029
0.030
0.153
0.216
0.139
0.072
0.070
0.038
0.156
z-value
p-value
0.472
125.184
-6.799
-3.003
-3.319
2.884
2.927
4.036
3.574
-4.799
8.306
0.637
< 2 · 10−16
1.82 · 10−11
0.003
9.00 · 10−4
0.004
0.003
5.43 · 10−5
3.51 · 10−4
1.59 · 10−6
< 2 · 10−16
***
***
***
***
**
**
***
***
***
***
181 / 288
Results for the final model
Regression
coefficient
β11
β12
β13
β14
β15
β16
β17
β18
β19
Parameter
Estimate
0.010
-0.009
-0.010
-0.005
0.407
0.182
0.262
-1.275
-0.460
Standard
Error
0.003
0.002
0.004
0.001
0.173
0.060
0.102
0.171
0.104
z-value
p-value
3.744
-3.501
-2.656
-4.055
2.349
3.004
2.577
-7.421
-4.420
1.81 · 10−4
< 4.63 · 10−4
0.008
5.01 · 10−5
0.019
0.002
0.009
1.17 · 10−13
9.89 · 10−6
***
***
**
***
*
**
**
***
***
182 / 288
Resulting life tables
• As an illustration, let us now consider three risk profiles: a low
one, a medium one and a high one.
• The low risk profile corresponds to duration less than 2, sum
insured in b3-b5.
• The medium risk profile corresponds to duration 5+, sum
insured in b1, product type TL.
• The high risk profile corresponds to duration 5+, no
underwriting, extra mortality 100, sum insured in b1.
• The corresponding logarithmic mortality curves are displayed
next.
183 / 288
Death rates according to risk profile
low
medium
high
20
30
40
50
Age
60
70
80
184 / 288
Outline
Observed mortality trends (Source: HMD, www.mortality.org)
Stochastic modelling for survival analysis
Graduation and smoothing via local regression (Source: Statistics
Belgium)
Cohort life tables and mortality projection models
Model risk
Adverse selection and risk classification
Credibility for death counts
Systematic mortality risk in life insurance
First-order life tables
Pandemics
Managing longevity risk
185 / 288
Death counts
• Consider a given life insurance portfolio observed for T
consecutive periods of time.
• Let Dit be the number of deaths recorded in cell i during
period t, i = 1, . . . , n, t = 1, . . . , T .
• Let ETRit be the corresponding (central) exposure-to-risk.
• By cell, we mean a given combination of risk factors, including
gender, age, smoking habit, sum insured, etc.
• Let
Di• =
T
X
Dit , i = 1, . . . , n,
t=1
be the total number of deaths in cell i during the observation
period.
186 / 288
Death counts
• Let µref
i be the reference force of mortality, or best-estimate
for cell i (derived from market statistics, say).
• The a priori expected number of deaths for cell i in period t is
δit = E[Dit ] = ETRit µref
i .
• If appropriate, the reference force of mortality may depend on
ref
calendar time (i.e., µref
it instead of µi ) to allow for longevity
improvements.
• To include past mortality experience in the predictive
distribution of Di,T +1 , we use standard nonlife credibility
models.
187 / 288
Company-specific random effect
• Let Θ represent the company relative risk level, with respect
to the reference life table.
• Specifically, we assume that given Θ = θ, the random
variables Dit , t = 1, 2, . . ., are independent and obey the
Poi(δit θ) distribution, i.e.
Pr[Dit = k|Θ = θ] = exp(−θδit )
(θδit )k
, k = 0, 1, 2, . . . ,
k!
with δit = ETRit µref
i .
• A priori, we assume that E[Θ] = 1 so that the reference life
table produces the a priori expected number of deaths:
E[Dit ] = E[δit Θ] = δit .
188 / 288
A priori distributions for Θ and Dit
• Assume that Θ ∼ Gam(a, a) with probability density function
fΘ (θ) =
1 a a−1
a θ
exp(−aθ), θ > 0.
Γ(a)
• Then, for non-negative integer k,
θδ k
it
Pr[Dit = k] =
exp − θδit
fΘ (θ)dθ
k!
0
k a
δit
a
a+k −1
=
a
a + δit
a + δit
Z
+∞
so that Dit obeys the Negative Binomial distribution.
189 / 288
A posteriori distribution for Θ given Dit = dit , i = 1, . . . , n,
t = 1, 2, . . . , T
Qn
i=1
R +∞
0
Qn
QT
i=1
t=1 exp
QT
t=1 exp
− θδit
− ξδit
dit !
θδit
dit !
1
a a−1 exp(−aθ)
Γ(a) a θ
dit !
ξδit
dit !
1
a a−1 exp(−aξ)dξ
Γ(a) a ξ
P
P
P P
a+ ni=1 T
t=1 dit −1
exp −θ a + ni=1 T
δ
θ
it
t=1
= R
Pn PT
P P
+∞
ξ a+ i=1 t=1 dit −1 dξ
exp −ξ a + ni=1 T
t=1 δit
0
!!
n X
T
Pn PT
X
= exp −θ a +
δit
θa+ i=1 t=1 dit −1
i=1 t=1
a+
Pn
i=1
PT
Γ a+
t=1 δit
Pn
i=1
a+Pni=1 PTt=1 dit
PT
t=1 dit
.
190 / 288
A posteriori distribution for Θ
• A posteriori, we thus have
[Θ|Dit , i = 1, 2, . . . , t = 1, 2, . . . , T ] ∼ Gam (a + D•• , a + δ•• ) .
• Hence
E[Θ|Dit , i = 1, 2, . . . , t = 1, 2, . . . , T ]
=
=
=
a + D••
a + δ••
δ••
D••
a
×1+
×
a + δ••
a + δ••
δ••
a
δ••
× E[Θ] +
a + δ••
a+δ
| {z ••}
bT.
×Θ
=credibility factor
191 / 288
Credibility predictor
• Given past mortality experience, the future number of deaths
Di,T +1 has Negative Binomial distribution with updated
parameters:
Pr[Di,T +1 = k|Djt for j = 1, . . . , n and t = 1, . . . , T ]
=
a + D•• + k − 1
a + D••
δi,T +1
a + δ•• + δi,T +1
k a + δ••
a + δ•• + δi,T +1
a+D••
.
• The expected number of deaths in cell i for next year T + 1 is
therefore given by
E[Di,T +1 |Djt for j = 1, . . . , n and t = 1, . . . , T ] = δi,T +1
a + D••
.
a + δ••
• Note that the method still applies if the portfolio has been
observed for a single year (T = 1).
192 / 288
Cell-specific random effect
• Instead of using a single company-specific relative risk level Θ,
we could associate a specific relative risk level Θi to each cell
i = 1, . . . , n.
• Using a Θi specific to cell i offers more flexibility: the relative
risk level is allowed to vary between cells, which is more
realistic.
• A priori, we assume that the Θi are independent and
identically distributed, with E[Θi ] = 1 for all i.
• The independence of the random effects does not induce any
smoothing effect: each cell is revised based on its own
experience.
• Some structure can be imposed on the Θi by using a
hierarchical credibility model.
193 / 288
AE ratio
• The key observation is the actual over expected (AE) ratio
defined for cell i and year t as
Xit =
Dit
Dit
=
.
E[Dit ]
δit
• Thus, cell i is represented by a random sequence
(Θi , Xi1 , Xi2 , ...).
• The random sequences (Θi , Xi1 , Xi2 , ...), i = 1, 2, . . . , n, are
mutually independent.
• We retain the conditional Poisson specification for the death
counts Dit , so that
E[Dit |Θi ] = V[Dit |Θi ] = δit Θi .
194 / 288
Buhlmann-Straub credibility model
• Given Θi , the random variables Xit , t = 1, 2, ..., are mutually
independent.
• Further,
E[Xit |Θi ] = µ(Θi ) = Θi
Θi
σ 2 (Θi )
=
V[Xit |Θi ] =
wit
wit
where wit = δit = E[Dit ] measures the “weight” for cell i in
year t in the portfolio.
• The conditional mean and variance of the AE ratio follow
from the conditional Poisson distribution for the death counts
which ensures that
µ(Θi ) = σ 2 (Θi ) = Θi .
195 / 288
Linear credibility predictor
bi,T +1 for Xi,T +1 is of the
• The linear credibility predictor X
PT
form ci0 + t=1 cit Xit .
• The coefficients cit minimize the objective function
h
2 i
Ψ(c) = E Θi − ci0 − ci1 Xi1 − ci2 Xi2 − ... − ciT XiT
.
• Setting
get
∂
∂ci0 Ψ(c)
cit =
and
∂
∂cit Ψ(c),
t = 1, 2, ..., T , equal to 0, we
σ 2 wi•
σ 2 wit
and
c
=
1
−
i0
1 + σ 2 wi•
1 + σ 2 wi•
where σ 2 = V[Θi ] measures residual heterogeneity.
196 / 288
Linear credibility predictor (Ctd)
• The linear credibility predictor is given by
bi,T +1 =
X
1
+
1 + σ 2 wi•
σ 2 wi•
1 + σ2w
| {z i•}
T
1 X
wit Xit .
wi•
t=1
=credibility factor αi
• Behavior of the credibility factor αi : (i) αi → 1 if wi• → ∞;
(ii) αi increases if σ 2 increases.
• The expected number of deaths for next year T + 1 in cell i is
bi,T +1 = δi,T +1
δi,T +1 X
1 + σ 2 Di•
.
1 + σ 2 δi•
• This is in line with the Gamma assumption for Θi as σ 2 = 1/a
in this case.
197 / 288
Example: Di1 and δi1
Risk class 1
Risk class 2
Risk class 3
Male
64
(108.1)
44
(50.9)
54
(72)
Female
15
(32.8)
15
(16.1)
9
(8.5)
Between brackets: δi1 , i.e. company expected claims under market
mortality.
198 / 288
Estimation of σ 2 = V[Θi ]
As Di• ∼ MPoi(δi• , Θi ),
h i
h i
V[Di• ] = E V Di• Θi + V E Di• Θi
= E δi• Θi + V δi• Θi
=
n
X
V[Di• ]
i=1
=
2
δi• + σ 2 δi•
n
n
X
X
2
2
δi• + σ
δi•
i=1
σ2
i=1
V[D
]
−
δ
i•
i•
i=1
Pn
=
2
i=1 δi•
2
Pn D
−
δ
−
D
i•
i•
i•
i=1
Pn
⇒ σ
b2 =
= 0.1168.
2
i=1 δi•
Pn
199 / 288
bi2
AE ratio X
• The predicted AE ratios for year 2
b2 Di1
bi2 = 1 + σ
X
with σ
b2 = 0.1168
1+σ
b2 δi1
are displayed in the next table:
Risk class 1
Risk class 2
Risk class 3
• The corresponding expected
Male Female
0.6220 0.5697
0.8840 0.9554
0.7766 1.0293
number of deaths is
b i2 = δi2 X
bi2 .
D
• The estimated AE ratios varying quite a lot between cells,
cell-specific random effects Θi should be preferred over a
single company-specific Θ.
200 / 288
Weighting past experience
• Sometimes, data show significant changes over time.
• In such a case, the predictive ability of past experience
decreases with the lag between the period of risk prediction
and the period of occurrence.
• Letting the random effects depend on time (i.e., Θit instead
of Θi ) accounts for this effect by appropriately discounting
past experience.
• Of course, fitting such credibility models requires more
extensive data.
201 / 288
Outline
Observed mortality trends (Source: HMD, www.mortality.org)
Stochastic modelling for survival analysis
Graduation and smoothing via local regression (Source: Statistics
Belgium)
Cohort life tables and mortality projection models
Model risk
Adverse selection and risk classification
Credibility for death counts
Systematic mortality risk in life insurance
First-order life tables
Pandemics
Managing longevity risk
202 / 288
Conditional independence
• Assume that the forces of mortality are piecewize constant,
i.e. µx+ξ (t + τ ) = µx (t) for every integer x and t, and
ξ, τ ∈ [0, 1).
• Consider a group of n annuitants aged x1 , . . . , xn in year t0 .
• Their respective remaining lifetimes are T1 , . . . , Tn .
• Given a set of factors κ, T1 , . . . , Tn are independent with
Pr[Ti > ξ|κ] = ξ pxi (κ).
• We assume that the functions
κ 7→ ξ pxi (κ)
are monotonic in κ so that all the Ti s “move in the same
direction” when κ increases.
203 / 288
Example: Lee-Carter
If βx ≥ 0 for all x and κt0 +k ≤ κ0t0 +k for all k,
Pr[Ti > ξ|κ] = exp − (ξ − bξc) exp(αxi +bξc + βxi +bξc κt0 +bξc )
bξc−1
Y
exp − exp(αxi +k + βxi +k κt0 +k )
k=0
≥ exp − (ξ − bξc) exp(αxi +bξc + βxi +bξc κ0t0 +bξc )
bξc−1
Y
exp − exp(αxi +k + βxi +k κ0t0 +k )
k=0
= Pr[T1 > ξ|κ0 ]
so that κ 7→ ξ pxi (κ) is non-increasing.
204 / 288
Monotonicity in κ, some consequences
• Assume that κ 7→ ξ pxi (κ) is non-increasing.
• For any non-decreasing h, if κ ≤ κ0 then
Z ω−x
E[h(T1 )|κ] = h(0) +
Pr[Ti > ξ|κ]dh(ξ)
0
Z ω−x
≥ h(0) +
Pr[Ti > ξ|κ0 ]dh(ξ)
0
= E[h(T1 )|κ0 ].
• Similarly, the function
κ 7→ E[Ψ(T1 , . . . , Ti )|κ]
is non-increasing for any non-decreasing Ψ.
205 / 288
Exchangeable lifetimes
• Assume that the n policyholders have the same age
x0 = x1 = . . . = xn .
• For any t1 , . . . , tn ≥ 0, we have
" n
#
Y
Pr[T1 ≤ t1 , . . . , Tn ≤ tn ] = E
Pr[Ti ≤ ti |κ]
i=1
= E
" n
Y
1 − ti px0 (κ)
#
i=1
= Pr[T1 ≤ ti1 , . . . , Tn ≤ tin ]
for any permutation (i1 , . . . , in ) of (1, . . . , n).
• The Ti ’s are exchangeable random variables, that is, their
joint distribution function F is invariant under permutation:
F (t1 , . . . , tn ) = F (ti1 , . . . , tin ).
206 / 288
Pure endowments
• The insurer pays a capital K provided that the policyholder
aged x0 at time t0 is still alive at age x0 + d (at time t0 + d).
• Consider an homogeneous portfolio of n such contracts and
denote as T1 , T2 , . . . , Tn the remaining lifetimes of these
x0 -aged policyholders.
• The payout at maturity for the company is
Ln = K
n
X
Ii where Ii = I[Ti > d]
i=1
with
Pr[Ii = 1] = E Pr[Ii = 1|κ]
= E d px0 (κ)


d−1
Y
= E
px0 +j (t0 + j|κ) .
j=0
207 / 288
Positive dependence
h
i
h
i
C[Ii , Ij ] = E C[Ii , Ij |κ] + C E[Ii |κ], E[Ij |κ]
= 0 + C d px0 (κ), d px0 (κ)
= V d px0 (κ) > 0
Pr[Ii = 1, Ij = 1] =
Pr[Ii = 1] Pr[Ij = 1]
|
{z
}
+V d px0 (κ)
Probability under independence
Pr[Ii = 1|Ij = 1] =
Pr[Ii = 1]
| {z }
V d px0 (κ)
+
Pr[Ij = 1]
Probability under independence
> Pr[Ii = 1]
208 / 288
Value-at-Risk (VaR), Tail-VaR and CTE
• Given a risk X and a probability level p ∈ (0, 1), the
corresponding VaR is defined as
VaR[X ; p] = FX−1 (p) = inf{x ∈ R|FX (x) ≥ p}.
• The corresponding Tail-VaR is defined as
Z 1
1
VaR[X ; ξ] dξ.
TVaR[X ; p] =
1−p p
• The Conditional Tail Expectation is defined as
h i
CTE[X ; p] = E X X > VaR[X ; p] .
209 / 288
VaR and CTE for large portfolios
• In large portfolios, the characteristics of the insurer’s payout
Ln = K
n
X
I[Ti > d]
i=1
are essentially determined by those of d px0 (κ).
• Specifically, for n large enough,
Ln ≈ nK d px0 (κ)
VaR[Ln ; ] ≈ nKFd−1
px
0 (κ)
()
h
CTE[Ln ; ] ≈ nK E d px0 (κ)d px0 (κ) > Fd−1
px
0
i
()
.
(κ)
210 / 288
Example 1
• Assume that K = 1 and
d px0 (κ)
= p + κ where κ =
+∆ with probability 12
−∆ with probability 21 .
• Then,
"
E[Ln ] = E
n
X
#
I[Ti > d]
i=1
" " n
##
X
= E E
I[Ti > d]κ
i=1
= E[n(p + κ)] as given κ,
n
X
I[Ti > d] ∼ Bin(n, p + κ)
i=1
= np
h
i
= E Bin n, E[d px0 (κ)] .
211 / 288
Example 1 (Ctd)
" "
V[Ln ] = E V
n
X
I[Ti > d]κ
##
" " n
##
X
+V E
I[Ti > d]κ
i=1
i=1
= E[n(p + κ)(1 − (p + κ))] + V[n(p + κ)]
n
X
as given κ,
I[Ti > d] ∼ Bin(n, p + κ)
i=1
= n p − (p 2 + E[κ2 ]) + n2 V[κ]
= np(1 − p) + n∆2 (n − 1)
h
i
= V Bin n, E[d px0 (κ)] + n(n − 1)∆2 .
212 / 288
Example 1 (Ctd)
p
lim
n→+∞
V[Ln ]
n
=
=
Pr[Ln ≤ k] =
=
p
n(p − p 2 − ∆2 + n∆2 )
lim
n→+∞
n
∆ 6= 0
i
1 h
Pr Bin n, p − ∆ ≤ k
2
i
1 h
+ Pr Bin n, p + ∆ ≤ k
2
k 1X n
(p − ∆)j (1 − p + ∆)n−j
j
2
j=0
k
1X
+
2
j=0
n
j
(p + ∆)j (1 − p − ∆)n−j .
213 / 288
Example 2
• Given κ, the piecewize constant force of mortality applying to
an homogeneous group of 1,000 policyholders aged 80 in
calendar year t0 has the form
µ80 (t0 |κ) = exp(−2.21 + 0.03κ).
• Here, κ = −8.9 + with ∼ N or (0, σ 2 = 0.3).
• The prediction of 1 E80 with 3% technical interest rate is
1 E80
= (1.03)−1 1 p80 (t0 )
= (1.03)−1 exp − exp(−2.21 + 0.03E[κ])
= (1.03)−1 exp − exp(−2.21 + 0.03 × (−8.9))
214 / 288
Example 2 (Ctd)
At the portfolio level,
S
=
1000
X
I[Ti > 1]
i=1
= total payment of this portfolio
FS−1 (99.5%)
1 p80 (t0 |κ)
≈ 1000F1−1
p80 (t0 |κ) (99.5%)
= exp − exp(−2.21 + 0.03(−8.9 + )
= g () with g decreasing
F1−1
p80 (t0 |κ) (99.5%)
= Fg−1
() (99.5%)
= g F−1 (1 − 99.5%)
= exp − exp(−2.21 + 0.03
√
(−8.9 + 0.3Φ−1 (0.005)
215 / 288
Example 3
• Assume that µx (t|κ) = exp(αx + βx κt ). with
κt = κt−1 − 0.5 + t starting from κt0 = −7 and independent

 −4 with probability 0.25
0 with probability 0.5
t =

4 with probability 0.25
• Given κ, lifetimes are independent.
• We consider a portfolio comprising 1000 pure endowments
2 E80 with 3% interest rate and
α80 = −2.2, α81 = −1.9, β80 = 0.1, β81 = 0.08.
216 / 288
2 p80 (t0 |κ)
=
exp − exp − 2.2 + 0.1 × (−7.5 − 4)
−
exp − 1.9 + 0.08 × (−8 − 4 − 4) with probability
exp − exp − 2.2 + 0.1 × (−7.5 − 4)
−
exp − 1.9 + 0.08 × (−8 − 4 − 0) with probability
exp − exp − 2.2 + 0.1 × (−7.5 − 4)
−
exp − 1.9 + 0.08 × (−8 − 4+ 4) with probability
exp − exp − 2.2 + 0.1 × (−7.5)
1
−
exp − 1.9 + 0.08 × (−8 − 4) with probability 8
exp − exp − 2.2 + 0.1 × (−7.5)
−
probability 14
exp − 1.9 + 0.08 × (−8) with
exp − exp − 2.2 + 0.1 × (−7.5)
− exp − 1.9 + 0.08 × (−8 + 4) with probability 18
1
16
1
8
1
16
217 / 288
2 p80 (t0 |κ)
= (Ctd)
exp − exp − 2.2 + 0.1 × (−7.5 + 4)
−
exp − 1.9 + 0.08 × (−8 + 4 − 4) with probability
exp − exp − 2.2 + 0.1 × (−7.5 + 4)
−
probability 18
exp − 1.9 + 0.08 × (−8 + 4) with
exp − exp − 2.2 + 0.1 × (−7.5 + 4)
− exp − 1.9 + 0.08 × (−8 + 4 + 4) with probability
1
16
1
16
218 / 288
Example 3 (Ctd)
2 p80 (t0 |κ)
=



























0.796403
0.829700
0.851336
0.854748
0.877037
0.892302
0.896185
0.911783
0.926195
with
with
with
with
with
with
with
with
with
probability
probability
probability
probability
probability
probability
probability
probability
probability
0.0625
0.125
0.125
0.0625
0.25
0.0625
0.125
0.125
0.0625
219 / 288
Example 3 (Ctd)
At the portfolio level,
S
=
1000
X
I[Ti > 2]
i=1
VaR[S; 0.9] ≈ 1000 × VaR[2 p80 (t0 |κ); 0.9]
VaR[2 p80 (t0 |κ); 0.9] = 0.911783
CTE[S; 0.9] ≈ 1000 × CTE[2 p80 (t0 |κ); 0.9]
CTE[2 p80 (t0 |κ); 0.9] = E 2 p80 (t0 |κ)2 p80 (t0 |κ) > 0.911783
= 0.926195
220 / 288
Example 3 (Ctd)
• With a 14.5% safety margin, the premium income at time
t = 0 is
1000 × E[2 p80 (t0 |κ)] × 10.3−2 × 114.5% = 941.38.
• Assuming realized interest rate equal to 3%, the available
capital at time t = 2 amounts to
941.38 × 1.032 = 998.71.
• The probability that the insurer cannot pay the benefits is
Pr[S ≥ 999]
= 0.0625 1000 × 0.796403999 (1 − 0.796403) + 0.7964031000
+...
+0.0625 1000 × 0.926195999 (1 − 0.926195) + 0.9261951000 .
221 / 288
Comparison with independence
• Let T1⊥ , . . . , Tn⊥ be independent lifetimes, each Ti⊥ being
distributed as Ti , that is, for all t1 , . . . , tn ,
Pr[T1⊥ ≤ t1 , . . . , Tn⊥ ≤ tn ] = Pr[T1 ≤ t1 ] . . . Pr[Tn ≤ tn ].
Pbtc
• Let at| = k=1 v (0, k) denote an annuity certain, where btc
denotes the integer part of t, with the convention that the
empty sum is zero.
• On average, there is no effect of positive dependence since
" n
#
" n
#
X
X
E
aT ⊥ | = E
aTi | .
i=1
i
i=1
222 / 288
Comparison with independence
• As all the Ti s “move in the same direction” when κ increases,
we expect some positive dependence between the lifetimes.
• Clearly, t 7→ at| is a non-decreasing function.
• Therefore, we also expect some positive dependence between
the random variables aT1 | , . . . , aTn | .
• We would like to compare
" n
#
" n
#
X
X
TVaR
aT ⊥ | ; p and TVaR
aTi | ; p
i=1
i
i=1
as well as the stop-loss premiums


! 
! 
n
n
X
X
 and E 
.
E
aTi | − d
aT ⊥ | − d
i=1
i
+
i=1
+
223 / 288
Supermodularity
• The function φ : R2 → R is supermodular if
φ(b1 , b2 ) − φ(a1 , b2 ) − φ(b1 , a2 ) + φ(a1 , a2 ) ≥ 0
for all a1 ≤ b1 , a2 ≤ b2 .
• These functions are important since they put more weight
- on (b1 , b2 ) and (a1 , a2 ) expressing positive dependence because
both components are simultaneously large or small.
- than on (a1 , b2 ) and (b1 , a2 ) expressing negative dependence
because they both mix one large component with one small
component.
• If φ has a derivative φ(1,1) =
∂2
∂x1 ∂x2 φ
then
φ supermodular ⇔ φ(1,1) ≥ 0
as φ(b , b2 ) − φ(a1 , b2 ) − φ(b1 , a2 ) + φ(a1 , a2 ) =
R b1 R b21 (1,1)
(x1 , x2 )dx1 dx2 .
a1 a 2 φ
224 / 288
Positive Quadrant Dependence (PQD)
• Random variables X1 and X2 are positively quadrant
dependent (PQD in short) if for all x1 and x2
Pr[X1 > x1 , X2 > x2 ] ≥ Pr[X1 > x1 ] Pr[X2 > x2 ]
⇔ Pr[X1 ≤ x1 , X2 ≤ x2 ] ≥ Pr[X1 ≤ x1 ] Pr[X2 ≤ x2 ].
• Let (X1⊥ , X2⊥ ) be an independent version of (X1 , X2 ), i.e.
Pr[X1⊥ ≤ x1 , X2⊥ ≤ x2 ] = Pr[X1 ≤ x1 ] Pr[X2 ≤ x2 ] for all x1 , x2 .
• If (X1 , X2 ) is PQD then
TVaR[Ψ(X1⊥ , X2⊥ ); p] ≤ TVaR[Ψ(X1 , X2 ); p] for all p
for any non-decreasing and supermodular function Ψ.
• To establish this result, we need Tchen’s inequality.
225 / 288
Lemma: Tchen’s inequality
• Let φ be such that φ(1,1) ≥ 0.
• Integration by parts shows that for any (X1 , X2 ) and (Y1 , Y2 )
valued in R+ × R+ with identical marginals,
E[φ(Y1 , Y2 )] − E[φ(X1 , X2 )]
Z +∞ Z +∞ =
FY (x1 , x2 ) − FX (x1 , x2 ) φ(1,1) (x1 , x2 )dx1 dx2 .
0
0
• Therefore, FX (x1 , x2 ) ≤ FY (x1 , x2 ) for all x1 , x2
⇒ E[φ(Y1 , Y2 )] ≥ E[φ(X1 , X2 )].
• If X is PQD then E[φ(X1 , X2 )] ≥ E[φ(X1⊥ , X2⊥ )].
226 / 288
Proof that E[(Ψ(X1⊥ , X2⊥ ) − d)+ ] ≤ E[(Ψ(X1 , X2 ) − d)+ ]
• For h such that h0 ≥ 0 and h00 ≥ 0,
∂
∂2
∂
h Ψ(x1 , x2 ) = h00 Ψ(x1 , x2 )
Ψ(x1 , x2 )
Ψ(x1 , x2 )
∂x1 ∂x2
∂x1
∂x2
∂2
+h0 Ψ(x1 , x2 )
Ψ(x1 , x2 ) ≥ 0.
∂x1 ∂x2
• Thus, (x1 , x2 ) 7→ h Ψ(x1 , x2 ) is supermodular and
h
i
E h Ψ(X1⊥ , X2⊥ ) ≤ E h Ψ(X1 , X2 )
holds for every h such that h0 ≥ 0 and h00 ≥ 0.
• With h(x) = (x − d)+ , we see that
h
h
i
i
E Ψ(X1⊥ , X2⊥ ) − d + ≤ E Ψ(X1 , X2 ) − d + .
227 / 288
n
Lemma: TVaR[Z ; p] = inf a∈R a +
C [Z , t]
=
1
1−p E[(Z
− a)+ ]
o
E (Z − t)+ + tε
d
C [Z , t] = −F Z (t) + ε
dt
t→
7 C [Z , t] & ⇔ t ≤ VaR[Z ; 1 − ε]
t 7→ C [Z , t] % ⇔ t ≥ VaR[Z ; 1 − ε]
⇒ C [Z , t] minimum at t = VaR[Z ; 1 − ε]
C Z , VaR[Z ; 1 − ε]
=
E (Z − VaR[Z ; 1 − ε])+ + VaR[Z ; 1 − ε]ε
=
εTVaR [Z ; 1 − ε]
⇒ TVaR[Z ; p] = inf a +
a∈R
1
E[(Z − a)+ ]
1−p
228 / 288
Proof that TVaR Ψ(X1⊥ , X2⊥ ); p ≤ TVaR [Ψ(X1 , X2 ); p]
h
i
TVaR Ψ(X1⊥ , X2⊥ ); p
≤ VaR[Ψ(X1 , X2 ); p]
h
i
1
+
E Ψ(X1⊥ , X2⊥ ) − VaR[Ψ(X1 , X2 ); p] +
1−p
≤ VaR[Ψ(X1 , X2 ); p]
h
i
1
+
E Ψ(X1 , X2 ) − VaR[Ψ(X1 , X2 ); p] +
1−p
= TVaR [Ψ(X1 , X2 ); p] .
229 / 288
Characterizations of PQD
(X1 , X2 ) PQD
⇔ Pr[X2 > x2 |X1 > x1 ] ≥ Pr[X2 > x2 ] for all x1 , x2
⇔ Pr[X1 > x1 |X2 > x2 ] ≥ Pr[X1 > x1 ] for all x1 , x2
⇔ (h1 (X1 ), h2 (X2 )) PQD for any ↑ h1 and h2
⇔ C[h1 (X1 ), h2 (X2 )] ≥ 0 for any ↑ h1 and h2 such that
the covariance exists
⇔ E[h1 (X1 )h2 (X2 )] ≥ E[h1 (X1 )]E[h2 (X2 )] for any ↑ h1 and h2
such that the expectations exist
⇔ E[h(X1 , X2 )] ≥ E[h(X1⊥ , X2⊥ )] for any supermodular h
such that the expectations exist.
230 / 288
Positive cumulative dependence (PCD)
• The bivariate notion of PQD has been generalized to higher
dimensions in several ways.
• The random variables X1 , X2 , . . . , Xn are PCD if for any
i = 1, . . . , n − 1
X1 + . . . + Xi and Xi+1 are PQD.
• Then, X1 , X2 , . . . , Xn PCD

Pn
Pn
⊥

i=1 Xi ; p ≤ TVaR [ i=1 Xi ; p] for all p, and
 TVaR
⇒
h
hP
i
i

n
 E Pn X ⊥ − d
≤
E
(
X
−
d)
i
i=1 i
i=1
+ for all d.
+
231 / 288
PCD present values aT1 | , . . . , aTn |
For any non-decreasing h1 and h2 ,
i
h C h1 aT1 | + . . . + aTi | , h2 (aTi+1 | )
ii
h h = E C h1 aT1 | + . . . + aTi | , h2 (aTi+1 | )κ
h h i
i
+C E h1 aT1 | + . . . + aTi | κ , E[h2 (aTi+1 | )|κ]
h h i
i
= 0 + C E h1 aT1 | + . . . + aTi | κ , E[h2 (aTi+1 | )|κ]
= C [ψ1 (κ), ψ2 (κ)]
with
h i
ψ1 (κ) = E h1 aT1 | + . . . + aTi | κ and ψ2 (κ) = E[h2 (aTi+1 | )|κ]
both non-increasing in κ.
232 / 288
PCD present values aT1 | , . . . , aTn |
• Random variables X1 , . . . , Xn are associated if
C [Ψ1 (X1 , · · · , Xn ), Ψ2 (X1 , · · · , Xn )] ≥ 0
for all non-decreasing functions Ψ1 and Ψ2 for which the
covariances exist.
• If κ is associated then aT1 | , . . . , aTn | are PCD.
• When κ is multivariate Normal, κ is associated if, and only if,
all the elements of the covariance matrix are non-negative,
that is,
C[κs , κt ] ≥ 0 for all s and t.
233 / 288
Example: Lee-Carter
• If βx ≥ 0 for all the ages x and κ obeys an ARIMA model and
is such that
C[κs , κt ] ≥ 0 for all s and t
then aT1 | , . . . , aTn | are PCD.
• In particular, if the κt ’s obey a random walk with drift model,
then they are associated as for s < t,
C[κs , κt ] = C[κs , κs + (t − s)δ + ξs+1 + . . . + ξt ]
= V[κs ] ≥ 0.
234 / 288
Comparison with independence
• Assume that κ is associated.
• Then, aT1 | , . . . , aTn | are PCD,
"
TVaR
n
X
i=1
#
aT ⊥ | ; p ≤ TVaR
i
" n
X
#
aTi | ; p
for all p
i=1
and

E
n
X
i=1

! 
! 
n
X
 ≤ E
 for all d.
aT ⊥ | − d
aTi | − d
i
+
i=1
+
• Treating T1 , . . . , Tn as independent thus amounts to
underestimate the risk borne by the annuity provider.
235 / 288
VaR of present value of life annuity payments
• However,
such relationship for VaRs of
Pn there is in general
Pno
n
⊥
V = i=1 aTi | and V = i=1 aT ⊥ | .
i
• Since E[V ] = E[V ⊥ ],
Z ∞
Pr[V > t] − Pr[V ⊥ > t] dt = 0,
0
so that the graphs of the distribution functions of V and V ⊥
must cross at least once.
• Hence, there must exist probability levels p0 and p1 such that
VaR[V ; p0 ] < VaR[V ⊥ ; p0 ] and VaR[V ; p1 ] > VaR[V ⊥ ; p1 ].
236 / 288
Example: Lee-Carter
Let us compute the distribution function of V =
scenarios in the Lee-Carter framework:
Pn
i=1 aTi |
under 3
(1) lifetimes T1 , . . . , Tn being conditionally independent given κ
with common ξ-year conditional survival probability ξ px0 (κ);
this corresponds to V .
(2) lifetimes T1⊥ , . . . , Tn⊥ being independent with common ξ-year
survival probability E[ξ px0 (κ)]; this corresponds to V ⊥ .
(3) lifetimes T1⊥ , . . . , Tn⊥ being independent with deterministic
death rates
mxdet
(t0 + k) = exp αx0 +k + βx0 +k E[κt0 +k ] ;
0 +k
this is what is generally done in practice.
237 / 288
0.0
0.2
0.4
0.6
0.8
1.0
FV with n = 100, (1) — (2) - - - (3) · · ·
1050
1100
1150
1200
1250
238 / 288
0.0
0.2
0.4
0.6
0.8
1.0
FV with n = 1, 000, (1) — (2) - - - (3) · · ·
11200
11400
11600
11800
12000
12200
239 / 288
Conclusion
• In practice, actuaries often use deterministic projected life
tables:
arrays of numeric qx (t) indexed by age and calendar
time or a reference life table qx? to which
cohort-specific age shifts are applied.
• This amounts to use scenario 2 or scenario 3.
• The preceding graphs show that for sufficiently large portfolios
this approach may seriously underestimate the VaR at usual
probability level.
240 / 288
Large homogeneous portfolios
• For lifetimes conditionally independent and identically
distributed given κ, the diversification effect is apparent from
"n+1
" n
#
#
X
X
1
1
TVaR
aTi | ; p ≤ TVaR
aTi | ; p .
n+1
n
i=1
i=1
• In this case,
n
X
1X
lim
aTi | = ax0 (t0 |κ) = E[aT1 | |κ] =
k px0 (κ)v (0, k)
n→+∞ n
i=1
k≥1
quantifies the systematic risk.
• Moreover,
#
n
1X
TVaR [ax0 (t0 |κ); p] ≤ TVaR
aTi | ; p for all n and p.
n
"
i=1
241 / 288
Outline
Observed mortality trends (Source: HMD, www.mortality.org)
Stochastic modelling for survival analysis
Graduation and smoothing via local regression (Source: Statistics
Belgium)
Cohort life tables and mortality projection models
Model risk
Adverse selection and risk classification
Credibility for death counts
Systematic mortality risk in life insurance
First-order life tables
Pandemics
Managing longevity risk
242 / 288
First-order versus second-order basis
• By first-order technical basis, actuaries mean a set of
conservative assumptions used for pricing and reserving:
interest rates, death rates, lapse rates, etc.
• Experience basis is called second-order basis by actuaries.
• Contrarily to first-order mortality basis, second-order mortality
basis consists in the best estimate of future death rates
applying to the insured population.
• We aim to provide a method to design first-order mortality
basis in the context of life annuities.
• By appropriately reducing the death rates, we are allowed to
make the computations of VaR as if the lifetimes were
independent, with a corrected probability level.
243 / 288
Conservative life table
• Let us consider the cohort reaching age x0 in year t0 .
• For this cohort, we determine the conservative first-order life
[1]
table mx0 +k , k = 1, 2, . . ., in order to satisfy
[1]
Pr[mx0 +k (t0 + k|κ) ≤ mx0 +k for some k = 1, 2, . . .] ≤ mort
for some probability level mort small enough.
• This is equivalent to requiring that
[1]
Pr[mx0 +k (t0 + k|κ) ≥ mx0 +k for all k] ≥ 1 − mort .
[1]
• For convenience we express mx0 +k as a percentage π of a set
of reference forces of mortality mxref
, i.e.
0 +k
[1]
mx0 +k = πmxref
.
0 +k
244 / 288
Example: Lee-Carter
• In the Lee-Carter case, we impose
[1]
Pr[exp(αx0 +k + βx0 +k κt0 +k ) ≤ mx0 +k for some k = 1, 2, . . .]
≤ mort
or, equivalently,
[1]
Pr[exp(αx0 +k + βx0 +k κt0 +k ) ≥ mx0 +k for all k] ≥ 1 − mort .
[1]
• With mx0 +k = πmxref
, the value of π comes from
0 +k
"
Pr κt0 +k
#
ln πmxref
− αx0 +k
0 +k
≥
for all k = 1 − mort .
βx0 +k
245 / 288
Example: Lee-Carter
With mxref
= exp(αx0 +k + βx0 +k E[κt0 +k ]),
0 +k
h
Pr exp(αx0 +k + βx0 +k κt0 +k )
≥ exp(αx0 +k + βx0 +k E[κt0 +k ]) for all k
i
= Pr[βx0 +k (κt0 +k − E[κt0 +k ]) ≥ ln π for all k]
≥ 1 − mort
⇒ ln π is the mort quantile of the multivariate Normal random
vector
βx0 +1 κt0 +1 − E[κt0 +1 ] , . . . , βω κt0 +ω−x0 − E[κt0 +ω−x0 ] .
246 / 288
Computation under first-order basis
• Under the first-order mortality basis P1 the lifetimes
[1]
T1 , . . . , Tn are independent with death rates mx0 +k ,
k = 1, 2, . . .
• Under the second-order mortality basis P2 , the lifetimes are
conditionally independent given κ and have common death
rates mx0 +k (t0 + k|κ).
• For any time horizon k and z ≥ 0, we have
#
" n
#
" n
X
X
amin{Ti ,k}| ≤ z .
P2
amin{Ti ,k}| ≤ z ≥ (1 − mort )P1
i=1
i=1
247 / 288
Proof
• Define
[1]
A = κmx0 +k (t0 + k|κ) ≥ mx0 +k for all k = 1, 2, . . . .
• In words, A is the set of all the “safe” trajectories of κt0 +k ,
i.e. the trajectories such that the future death rates are always
above the conservative ones.
• By construction,
P2 [A] ≥ 1 − mort .
• Denote as Ei the mathematical expectation taken under Pi ,
for i = 1, 2.
248 / 288
Proof
P2
" n
X
#
amin{Ti ,k}| ≤ z
= P2
" n
X
i=1
+P2
a
min{Ti ,k}|
i=1
" n
X
#
≤ z A P2 [A]
amin{Ti ,k}|
#
≤ z A P2 [A]
i=1
≥ P2
≥ P2
" n
X
i=1
" n
X
amin{Ti ,k}|
#
≤ z A P2 [A]
amin{Ti ,k}|
#
≤ z A (1 − mort )
i=1
≥ P1
" n
X
#
amin{Ti ,k}| ≤ z (1 − mort )
i=1
249 / 288
Example: Lee-Carter with RWD
If
κt = κt−1 + δ + ξt with independent ξt ∼ N or (0, σ 2 )
then, (κt0 +1 , . . . , κt0 +ω−x0 ) is Multivariate Normal with mean
vector
m = (κt0 + δ, . . . , κt0 +ω−x0 + (ω − x0 )δ)T
and variance-covariance matrix
 2
σ
σ2 · · ·
 σ 2 2σ 2 · · ·

Σ= .
..
..
 ..
.
.
2
2
σ 2σ · · ·
σ2
2σ 2
..
.
(ω − x0 )σ 2



.

250 / 288
Example: Lee-Carter with RWD
The value of ln π can then be determined as a quantile of the
random vector
T
βx0 +1 κt0 +1 − (κt0 + δ) , . . . , βω κt0 +ω−x0 − (κt0 + (ω − x0 )δ)
that is multivariate Normal with 0 mean
matrix

σ 2 βx20 +1
σ 2 βx0 +1 βx0 +2
 σ 2 βx +1 βx +2
2σ 2 βx20 +2
0
0
e =
Σ

..
..

.
.
σ 2 βx0 +1 βω
2σ 2 βx0 +2 βω
and variance-covariance
···
···
..
.
σ 2 βx0 +1 βω
2σ 2 βx0 +2 βω
..
.
···
(ω − x0 )σ 2 βω2



.

251 / 288
Ruin probabilities
• Computing the ruin probability over (0, j) amounts to evaluate
the distribution function of
Wj =
n
X
amin{Ti ,j}| .
i=1
• Under P1 , this can easily be done with Panjer recursion
formula in the compound Binomial case whereas under P2 we
must account for the positive dependence existing between
the lifetimes.
• Define for k = 0, 1, . . .
[1]
qx0 +k
[1] = 1 − exp − mx0 +k = 1 − exp − πmxref
.
0 +k
252 / 288
Ruin probabilities
Under P1 , Wj appears to be a sum of independent random
variables amin{T1 ,j}| , . . . , amin{Tn ,j}| where each amin{Ti ,j}| is valued
in {0, a1| , . . . , aj| } and has probability distribution
[1]
P1 [amin{Ti ,j}| = 0] = P1 [Ti < 1] = qx0
P1 [amin{Ti ,j}| = a`| ] = P1 [` ≤ Ti < ` + 1]
[1]
[1]
[1]
= px0 . . . px0 +`−1 qx0 +` for ` = 1, . . . , j − 1
P1 [amin{Ti ,j}| = aj| ] = P1 [Ti ≥ j]
[1]
[1]
= px0 . . . px0 +j−1 .
253 / 288
Ruin probabilities
• Let Xi be amin{Ti ,j}| that has been appropriately discretized.
• Here, we keep the original probability mass at the origin, and
round the other values in the support of amin{Ti ,j}| to the least
upper integer (after having selected an appropriate monetary
unit).
• The probability mass function pX (k) = Pr[Xi = k] of the Xi ’s
has support {0, 1, . . . , daj| e}, with
[1]
pX (0) = qx0 > 0.
254 / 288
Ruin probabilities
P
• The probability mass function of the sum S = ni=1 Xi can be
computed from the following recursive formula:
s
1 X
pS (s) =
pX (0)
η=1
n+1
η − 1 pX (η)pS (s−η), s = 1, 2, . . . ,
s
n
starting from pS (0) = pX (0) .
• This recurrence relation is a particular case of Panjer recursion
formula in the compound Binomial case.
• It is known to be numerically unstable so that particular care
is needed when performing the computations.
255 / 288
Initial capital
• With an appropriate term structure of interest rate, we
determine the initial capital u so that the ultimate ruin
probability is at most solv under P1 for a portfolio of n
annuitants (with independent lifetimes subject to death rates
πmxref
).
0 +k
• Specifically, we consider the random variable Wω−x0 , and we
determine its 1 − solv quantile, w1−solv , say, under P1 .
• The initial capital needed to reach a solvency probability of
1 − solv is then
u(n, mort , solv ) = w1−solv .
256 / 288
Trade-off between mort and solv
• Let
[1]
φk (u) = non-ruin probability over (0, k) under first-order basis
[2]
φk (u) = non-ruin probability over (0, k) under second-order
basis.
• We then have
[2]
[1]
φk (u) ≥ (1 − mort )φk (u)
[2]
⇒ φk (w1−solv ) ≥ (1 − mort )(1 − solv ).
• Taking mort = solv = 1% gives a ruin probability of at most
1.99%.
257 / 288
Example: Lee-Carter with RWD
• We consider the cohort reaching age x0 = 65 in year
t0 = 2005, male Belgian population.
• We use the 2005 zero coupon yield curve published by
Eurostat.
• With mort = 1%, we get π = 93.2%.
• This gives a pure life annuity premium amount of 11.63e , to
be compared with 11.37e obtained with the projected life
table.
• The loaded premium is then obtained by dividing the required
capital w1−solv by the number of policies.
258 / 288
0.6
0.4
0.2
0.0
Cum. Dist. Function
0.8
1.0
Distribution function of aTi |
0
5
10
15
20
Present values
259 / 288
0.6
0.4
0.2
0.0
Cum. Dist. Function
0.8
1.0
Distribution function of aTi | discretized
0
5
10
15
20
Present values
260 / 288
Distribution of Wω−x0 with 10 policies
0.8
0.6
0.2
0.4
Cum. Dist. Function
0.020
0.015
0.010
0.005
0.0
0.000
Prob. Mass Function
0.025
1.0
w1−solv /10 =15.20e with solv = 1%
0
50
100
Values
150
0
50
100
150
200
Values
261 / 288
Distribution of Wω−x0 with 20 policies
0.8
0.6
0.2
0.4
Cum. Dist. Function
0.010
0.005
0.0
0.000
Prob. Mass Function
0.015
1.0
w1−solv /20 =14.35e with solv = 1%
0
100
200
Values
300
0
100
200
300
400
Values
262 / 288
Distribution of Wω−x0 with 30 policies
0.6
0.2
0.4
Cum. Dist. Function
0.010
0.005
0.0
0.000
Prob. Mass Function
0.8
0.015
1.0
w1−solv /30 =13.93e with solv = 1%
0
100
200
300
Values
400
500
0
100
200
300
400
500
600
Values
263 / 288
Outline
Observed mortality trends (Source: HMD, www.mortality.org)
Stochastic modelling for survival analysis
Graduation and smoothing via local regression (Source: Statistics
Belgium)
Cohort life tables and mortality projection models
Model risk
Adverse selection and risk classification
Credibility for death counts
Systematic mortality risk in life insurance
First-order life tables
Pandemics
Managing longevity risk
264 / 288
The Spanish flu 1918-1919
I
The three pandemics of the 20th century (1918-1920,
1957-1958, and 1968-1970) are the main source of empirical
evidence on the potential impact of the next pandemic.
I
The 1918-1920 Spanish flu pandemic caused the highest
mortality by far and is often used to set the upper bound on
the number of deaths caused by a future pandemic.
I
Aside from the high mortality rate, 99% of those who died
from the disease were under 65.
I
Murray et al. (Lancet, 2006) developed a statistical model
relating annual pandemic mortality to per-head gross domestic
product in international dollars and used this model to
estimate the effect on mortality of an influenza pandemic in
2004.
265 / 288
Estimation of pandemic mortality
I
Because influenza pandemics might increase mortality not
only in the year of peak incidence, but also in the following
year or two, death rates in a 3-year pandemic window are
compared with those in surrounding years:
mt
= all-age all-cause mortality rate in year t
PM = pandemic mortality
!
P1917
P1923
1920
X
m
+
m
t
t
t=1921
.
=
mt − t=1915
6
t=1918
I
The regression model is as follows:
ln PM = β0 + β1 × ln(1918 per-head income).
I
Note that pandemic mortality should be negatively related to
per-head income so that we expect β1 < 0.
266 / 288
Results for Belgium
I
The regression model is used to deduce PM in calendar year t
replacing the 1918 per-head income with the one of that year.
I
Multiplying the result with the total population gives the
estimated number of deaths caused by the emergence of a
pandemic influenza in year t.
I
The total number of deaths is distributed among age groups
using the proportions observed during the 1918 pandemic.
I
These extra deaths can be added to “regular” ones to get
pandemic qx .
I
The analysis has been conducted by Benit and Coulon (2011,
UCL Master thesis) for calendar year 2008.
267 / 288
Results for Belgium
βb0
=
2.93012 with standard error 1.38411
and p-value 4.7%
βb1
=
−0.98915 with standard error 0.17616
and p-value < 10−4
b
r [βb0 , βb1 ]
=
−0.9969
2
=
0.6119
N or 2.93012 − 0.98915 ln(per head income), ?
2
1.384112 + ln(per head income) × 0.176162
R
ln PM ≈d
?
=
−2 × 0.9969 × 1.38411 × 0.17616
× ln(per head income)
268 / 288
Deaths due to pandemic over expected deaths (in %)
Age
0-4
5-9
10-14
15-19
20-24
25-29
30-34
35-39
40-44
45-49
50-54
55-59
60-64
65-69
over 70
Total
Total 20-65
Males
542.19
597.81
311.88
173.94
224.41
238.47
169.51
68.67
36.15
16.64
8.49
3.44
3.78
3.36
0.95
8.92
25.81
Females
958.94
853.86
563.81
356.26
600.13
465.73
453.33
122.13
51.44
26.88
17.74
10.25
10.49
6.42
1.99
9.16
45.59
269 / 288
Example
I
Consider a portfolio comprising n one-year term life insurance
policies with unit benefit.
I
The death rate is M = (1 + I )µ with I = 1 if a pandemics
occur and 0 otherwise.
I
Let r = Pr[I = 1] = 1 − Pr[I = 0].
I
The number of deaths recorded by the company is
D=
n
X
I[Ti ≤ 1]
i=1
where T1 , . . . , Tn denote the policyholders’ remaining
lifetimes.
I
Given M, T1 , . . . , Tn are assumed to be independent.
270 / 288
Example (Ctd)
=
=
=
=
C I[Ti ≤ 1], I[Tj ≤ 1] for i 6= j
h h i
E C I[Ti ≤ 1], I[Tj ≤ 1]I + C E I[Ti ≤ 1]I , E I[Tj ≤ 1]I
h
i
0 + C 1 − exp − (1 + I )µx , 1 − exp − (1 + I )µx
h
i
V exp − (1 + I )µx
h
i h
i2
E exp − 2(1 + I )µx − E exp − (1 + I )µx
= r exp(−4µx ) + (1 − r ) exp(−2µx )
− r exp(−2µx ) + (1 − r ) exp(−µx )
2
= r (1 − r ) exp(−2µx ) − exp(−µx )
2
with maximum when r = 12 .
271 / 288
Example (Ctd)
k
n−k
n
(1 − r )
1 − exp(−µ)
exp(−µ)
k
k
n−k
n
1 − exp(−2µ)
exp(−2µ)
+r
k
Pr[D = k]
=
E[D]
=
nE[1 − exp(−M)]
=
(1 − r )n 1 − exp(−µ) + rn 1 − exp(−2µ)
=
nV I[T1 ≤ 1] + n(n − 1)C I[T1 ≤ 1], I[T2 ≤ 1]
=
n Pr[T1 ≤ 1] Pr[T1 > 1]
V[D]
+n(n − 1)r (1 − r ) exp(−2µx ) − exp(−µx )
D
n
2
→ 1 − exp(−M) with probability 1.
272 / 288
Mortality catastrophe bonds
• Market-traded securities whose payments are linked to a
mortality index, similar to catastrophe bonds.
• First such bond issued: Swiss Re bond (Vita 1) in December
2003 securitizing Swiss Re’s own exposure to certain
catastrophic mortality events (influenzea, major terrorist
attack, etc.)
• The $ 400 millions principal was at risk if, during any single
calendar year, the combined mortality index (weighted by age,
sex and nationality US-UK-France-Italy-Switzerland) exceeded
130% of the baseline 2002 level, and would be exhausted if
the index exceeded 150%.
• The main investors were pension funds to which the bond
provided both an attractive return and a good hedge.
• Vita 2 by Swiss Re in 2005, Vita 3 by Swiss Re in 2007,
Tartan by Scottish Re in 2006, Osiris by AXA in 2006, etc.
273 / 288
Outline
Observed mortality trends (Source: HMD, www.mortality.org)
Stochastic modelling for survival analysis
Graduation and smoothing via local regression (Source: Statistics
Belgium)
Cohort life tables and mortality projection models
Model risk
Adverse selection and risk classification
Credibility for death counts
Systematic mortality risk in life insurance
First-order life tables
Pandemics
Managing longevity risk
274 / 288
Managing longevity risk: Possible strategies
• Not selling life annuities anymore (risk avoidance).
• Holding enough capital to act as a buffer against longevity
risk, accounting for model risk.
• Buying reinsurance treaties covering longevity risk, if available
and not too expensive.
• Entering securitization.
• Favoring “natural hedging” inside the insurance company.
• Re-thinking product design:
- selling only temporary annuities so that biometric bases can be
revised regularly;
- indexing annuity payments, deferred period, etc.
275 / 288
Life insurance securitization
• Life and Longevity Markets Association (LLMA).
• q-forward targets qx : agreement between two counterparties
to exchange at a future date (the maturity of the contract) an
amount equal to the realized mortality rate of a given
population (the floating leg) at that future date, in return for
a fixed mortality rate (the fixed leg) agreed upon at the
inception of the contract.
• S-forward targets px : agreement between two counterparties
to exchange at a future date (the maturity of the contract) an
amount equal to the realized survival rate of a given
population (the floating leg) at that future date, in return for
a fixed survival rate (the fixed leg) agreed upon at the
inception of the contract.
• And many more...
• Also, Xpect indices of the Deutsche Börse (xpect-index.com).
276 / 288
Life settlement securitization
• With life settlements, life policies are sold by their owner for
more than the surrender value but less than the face value.
• They are then packaged and sold to investors.
• Senior life settlement (SLS) securitization began in 2004 to
deal with the life policies of elderly high net worth individuals.
• Two medical doctors or underwriters independently assess
each policyholder’s life expectancy.
• The Life Exchange was established in 2005, the Institutional
Life Markets Association started in 2007 in New York as the
trade body for the life settlements industry.
277 / 288
Securitization
• Mortality derivatives are attractive to investors because of
(supposed) low correlation with existing assets.
• A crucial issue is thus the possible correlation of the index to
financial markets.
• An index directly related to the demographic structure (like
the proportion of the population aged 65 and over, for
instance) should be avoided.
• An ideal index only reflects the increase in longevity.
• A good candidate is the general population period life
expectancy at some preset age (the retirement age for
instance).
• It is transparent to investors and published yearly by national
agencies.
• However, adverse selection is not accounted for and basis risk
remains.
278 / 288
Example: Lee-Carter
• The time index κt appears as a natural candidate for being
the longevity index in the Lee-Carter framework.
• However, κt is not appropriate for that purpose, for (at least)
two reasons:
(i) First and foremost, it is not transparent to investors and lacks
of intuitive meaning.
(ii) Second, it is not unique but depends on the identifiability
constraint, as well as on re-estimation techniques.
• If βx+j ≥ 0 for all j, then
ex↑ (t0 + k|κ) =


d−1
X
1 X
+
exp −
exp αx+j + βx+j κt0 +k  .
2
d≥1
j=0
is a decreasing function of the time index κt0 +k .
279 / 288
Example: Lee-Carter
• Denote as Fe ↑ (t
x
0 +k|κ)
the distribution function of ex↑ (t0 + k|κ).
−1
• Recall that if g is decreasing, Fg−1
(X ) () = g (FX (1 − )).
• Let Φ be the distribution function of the standard Normal
distribution N or (0, 1).
• The quantile function of ex↑ (t0 + k|κ) is then given by
n
o
F −1
() = inf s ∈ R | Fe ↑ (t +k|κ) (s) ≥ ↑
ex (t0 +k|κ)
x
0


d−1
X
p
1 X
exp αx+j + βx+j E[κt0 +k ] + V[κt0 +k ]Φ−1 (1 − )  .
exp −
= +
2
d≥1
j=0
280 / 288
Index-based payoffs
• In many countries, governmental agencies perform mortality
projections and publish future life expectancies from these
official forecasts.
• Denote as ex↑,ref (t0 + k) the period life expectancy at age x in
calendar year t0 + k, taken from the reference forecast.
• We could imagine elementary payoff structures
↑
↑,ref
0 if e65
(t0 + k|κ) < e65
(t0 + k)
=
d otherwise.
281 / 288
Index-based payoffs
• In some demographic projections, future mortality is assumed
to follow either a High, Medium or Low scenario.
↑,high
↑,medium
• Let e65
(t0 + k) and e65
(t0 + k) be the life
expectancies in the High and Medium scenarios.
=
• In such a case, the payoff could be

↑
↑,medium

0 if e65
(t0 + k|κ) < e65
(t0 + k)


↑
↑,medium

(t0 +k|κ)−e65
(t0 +k)
 d e65





↑,high
↑,medium
e65
(t0 +k)−e65
(t0 +k)
↑,medium
↑
if e65
(t0 + k) < e65
(t0 +
↑
↑,high
d if e65 (t0 + k|κ) > e65 (t0 + k).
↑,high
k|κ) < e65
(t0 + k)
282 / 288
Index-based payoffs
• More elaborate payoffs could be based on several consecutive
annual life expectancies, with a cash stream if they all exceed
the corresponding reference forecasts.
• Longevity fan charts can be used to visualize the future
evolution of the index.
• This can help defining a short-term longevity cat bond (if
several consecutive life expectancies get out of their prediction
bands).
• A basket of European national indices could be defined, with
payoffs involving life expectancies in several countries (and
possibly several reference forecasts).
• A payoff depending on the maximum period life expectancy of
a group of industrialized countries could also be of interest.
283 / 288
Natural hedging
• Exploiting the natural hedge provided by the term insurance
portfolio against annuity business seems promising.
• Natural hedging uses the interaction of term life insurance and
annuities to a change in mortality to stabilize aggregate cash
flows.
• This offers a competitive advantage to insurers writing both
term life insurance and annuities.
• Alternatively, this encourages life insurers and annuity writers
(e.g., pension funds) to enter mortality swaps.
• However, contract terms and relevant ages typically differ.
• Moreover, computing the volume of term insurance business
needed to counteract longevity risk on the annuity book is
subject to model risk (for instance, Lee-Carter favors natural
hedging because of the single time index).
284 / 288
Product design
• Benefits in case of death and in case of survival can be
included in the same policy.
• Including death benefits into an annuity contract might be
explained by bequest motives.
• This amounts to associate counter-monotonic risks in the
same contract, which counteracts longevity and mortality
risks.
• However, the additional death benefit may significantly
decrease the annuity incomes and thus may not attract
customers.
• Indexing premium and/or benefits on mortality dynamics
might be a viable alternative.
285 / 288
Indexed life annuity
• Let us consider an individual buying an indexed life annuity
contract at age x0 in year t0 .
• Let pxref
(t0 + k) be a forecast for the survival of some
0 +k
reference population to which the individual belongs.
• The annual payment of 1 due at time k is adjusted by
!
k−1
Y pxref+j (t0 + j)
0
.
it0 +k =
obs (t + j)
p
0
x
+j
0
j=1
• We can limit the impact of the index on the annuity
payments, using
it0 +k (imin , imax ) = max{min{it0 +k , imax }, imin }
for some imin < 1 < imax , e.g. imin = 0.8 and imax = 1.2.
286 / 288
Advanced-life delayed annuities (ALDA)
• The advanced-life delayed annuities (ALDA) are deferred
(inflation-adjusted) life annuity contracts.
• The deferment period can be seen as a deductible: the
policyholder finances his consumption until some advanced
age, 80, 85 or even 90, say, and the insurer starts paying the
annuity at this age provided the annuitant is still alive.
• Hence, the ALDA transforms the consumer choice and
asset-allocation problem from a stochastic date of death to a
deterministic one in which the terminal horizon becomes the
annuity payment commencement date.
• The longevity risk involved in the ALDA is quite substantial
for the insurance company.
287 / 288
Deferred life annuity with indexed deferment period
• Some indexing can be applied to make the pricing more
competitive.
• If the index is publicly available, the annuitant is able to
adjust his consumption level during the deferred period.
• Note that we could also think of alternative indexing
mechanisms for ALDA.
• Considering a deferred life annuity bought at age 65 with
payments starting at age 80, say, we could let the starting age
vary according to actual longevity improvements.
• If longevity increases more than expected, then payments may
start at age 82 instead of 80, for instance.
288 / 288
Download