Important supplementary notes for Exam C

advertisement
Supplementary notes for Exam C
Overview
1.1
Introduction
In July and August 2013, the SoA added a number of questions to the sample exam questions
document for Exam C on the Be-an-Actuary website. These were to cover syllabus items recently
added to Exam C. The attached note covers the additional material needed for these syllabus
items.
There are three sections in this note. The first looks at the idea of an extreme value distribution.
The second describes an alternative approach to dealing with large data sets. The final section
introduces a number of additional simulation techniques in various situations.
As you read this material, you should have in mind as far as possible the material in Chapter 2
for the first section, that in Chapter 7 for the second section, and that in Chapter 12 for the final
section.
We have given the syllabus items themselves in an appendix to this note.
1
Supplementary note
1.2
Extreme value distributions
The following section should be read in conjunction with Chapter 2 of the textbook.
There are some areas of insurance work where it is useful to model quantities using distributions
with particularly heavy tails. One example of a situation like this would be when constructing a
model for the largest value in a set of identical and independently distributed random variables.
If we are trying to model the maximum value from a random sample, intuitively this maximum
value is likely to be in some sense large. We may therefore want to model it with a distribution
which is heavy-tailed.
Distributions of this type are known as extreme value distributions. In the situation outlined
above, the inverse Weibull distribution is often use as a model. This is related to the Weibull
distribution studied earlier as follows.
Inverse Weibull distribution (Fréchet distribution)
If a random variable X has a Weibull distribution, then the random variable Y  1 / X is said to
have an inverse Weibull distribution. The inverse Weibull distribution has the following
attributes:
  x / 
 ( x / )
e
PDF:
f (x) 
CDF:
F  x   e ( x / )
Moments:
E X k   k   1  k / 
x

 
This distribution is sometimes known as the Fréchet distribution. More details for the Inverse
Weibull distribution are given in the Tables for Exam C.
The process of finding the distribution for the random variable Y  1 / X can be applied to other
distributions. Examples of the inverse exponential distribution and the inverse gamma
distribution are given in the Tables for Exam C.
Other distributions which are sometimes used in this context are the Gumbel distribution, and
the Weibull distribution itself (without inversion). Details for the Gumbel distribution are given
below.
Gumbel distribution
A random variable X is said to have a Gumbel distribution if:
f (x) 
1 y
e exp   e  y 



where y 
x

, and   x  
The distribution function is:
F( x )  exp   e  y 


The details for the Weibull distribution itself are given in Chapter 2 of the textbook. The Pareto
distribution also has a thick tail, and can sometimes be used in these situations.
2
Supplementary note
1.3
Large data sets – an alternative approach
The following section should be read in conjunction with Chapter 7 of the textbook.
We have seen in Chapter 7 how the Kaplan-Meier method can be adapted for use with large data
sets. Here we look at another approach for calculating mortality rates when large numbers of
lives are involved. To help to describe the main features of the method, we shall use a small
sample of six lives to demonstrate the approach.
The exact exposure method
A company is trying to estimate mortality rates for the holders of a certain type of policy. It has
the following information about a group of 6 lives, who all hold a policy of this type. The
investigation ran for a three-year period, from Jan 1 2010 to Dec 31 2012.
Life
Date of birth
Date of purchase
Mode of exit
Date of exit
1
Mar 1 1965
Jul 1 2009
Alive
Dec 31 2012
2
Jul 1 1965
Nov 1 2009
Death
Mar 1 2011
3
Aug 1 1965
Apr 1 2010
Surrender
Feb 1 2012
4
Apr 1 1965
Jun 1 2011
Alive
Dec 31 2012
5
May 1 1965
Aug 1 2010
Surrender
Jun 1 2012
6
Oct 1 1965
May 1 2010
Death
Apr 1 2012
We see that of the 6 lives, two survived within the population to the end of the investigation, two
of the lives surrendered their policies while the investigation was in progress, and two died
during the period of the investigation. We wish to use the information in the table above to
estimate mortality rates at various ages. We shall assume that each month is exactly one-twelfth
of a year, to simplify the calculations.
We start by finding the ages at which each life started to be observed, and the age at which life
ceased to be observed. Note that although Life 1 purchased his policy on July 1 2009, the
investigation had not started at that point. So the date on which Life 1 is first observed is
January 1 2010. Life 2 is also first observed on this date.
3
Supplementary note
This gives us the following table of ages:
Life
Age at first observation
Age at last observation
1
10
44 12
10
47 12
2
6
44 12
8
45 12
3
8
44 12
6
46 12
4
2
46 12
9
47 12
5
3
45 12
1
47 12
6
7
44 12
6
46 12
In order to estimate the mortality rates, we need to find out the length of time for which each life
was alive, and for which they were a member of the investigation. We need to subdivide these
periods by age last birthday. So, for example, we shall use e44 for the period of time during
which a life (or group of lives) was aged 44 last birthday, e45 for the corresponding period of
time for which lives were aged 45 last birthday, and so on.
From the table above, we can now find the contribution of each life to each of e44 , e45 , e46 and
e47 . This gives us the following table of figures (in months):
Life
Age at first
observation
Age at last
observation
e44
e45
e46
e47
1
10
44 12
10
47 12
2
12
12
10
2
6
44 12
8
45 12
6
8
-
-
3
8
44 12
6
46 12
4
12
6
-
4
2
46 12
9
47 12
-
-
10
9
5
3
45 12
1
47 12
-
9
12
1
6
7
44 12
6
46 12
5
12
6
-
This gives us totals in each of the ek columns of 17, 53, 46 and 20 respectively.
4
Supplementary note
We can now use these to calculate estimates of the mortality rates. It can be shown that d j / e j
provides us with the maximum likelihood estimate of the hazard rate at each age. Noting that
Life 2 dies at age 45 last birthday, and that Life 6 dies aged 46 last birthday, we can find estimates
of the hazard rates at these two ages:
hˆ 45 
1
 0.22642
53 /12
hˆ 46 
and:
1
 0.26087
46 /12
Note that we do not have enough data to provide estimates of the hazard rates at any other age.
Alternatively we could claim, without much conviction, that our estimates of the mortality rates
at ages 44 and 47 were zero, based on this very small sample of data.
If we wish to find the values of the corresponding q -type mortality rates, we use the
relationship:
ˆ
qˆ x  1  e  hx
In this case we obtain corresponding q -type rates of 0.20261 and 0.22962 respectively.
The method we have used here is called the exact exposure method. We have calculated the
exact period of time for which a group of lives has been exposed to the risk of death for a
particular age.
The actuarial exposure method
An alternative approach is to use what is called the actuarial exposure method. This provides us
with a direct estimate of the q -type mortality rates, but it is perhaps not so intuitively appealing.
We proceed as follows:
(1)
Calculate the contribution of each life to each of the e j figures, as above.
(2)
For each of the lives that die (and only for the deaths), add in the period of time from the
date of death until the end of the year of age (ie the period of time until the life would
have achieved its next birthday). This increases the contribution from the deaths to one
(or sometimes two) of the e j figures.
(3)
The q -type rates are now given directly by d j / e j .
If we apply this method to the data given above, we have the following alterations:
(a)
Life 2 now contributes 12 months to e45 .
(b)
Life 6 now contributes 12 months to e46 .
All the other figures in the table are unchanged. The column totals are now 17, 57, 52 and 20
respectively. If we recalculate our mortality estimates, we find that
qˆ 45 
1
 0.21053
57 /12
and:
qˆ 46 
1
 0.23077
52 /12
5
Supplementary note
Although we have used in these examples a sample of only 6 lives, you should be able to see that
the method generalizes easily, and can cope with large sample data without any real increase in
the level of difficulty of the calculation.
The approaches outlined above are sometimes called seriatim methods. This refers to the fact
that the data points are analyzed as a series of independent observations.
Insuring ages
A variation on this idea is to use the concept of insuring ages. In this case, an insurer will
designate each policyholder to have their birthday on the date on which the policy was first taken
out. So, for example, if a person is aged 45 last birthday when he takes out his policy, we treat
him as if he is aged exactly 45 on the issue date. This means that some of the elements of the
exposure will be assigned to younger ages than would be the case when using the policyholder’s
true birthday.
Example 1.1
Reanalyze the data given above for the six lives, using insuring ages by age last birthday.
Recalculate the estimates of the hazard rates at ages 45 and 46.
Solution
We now have the following table.
Life
Date of birth
Date of
purchase
New
birthday
Age at first
observation
Age at last
observation
1
Mar 1 1965
Jul 1 2009
Jul 1 1965
6
44 12
6
47 12
2
Jul 1 1965
Nov 1 2009
Nov 1 1965
2
44 12
4
45 12
3
Aug 1 1965
Apr 1 2010
Apr 1 1966
44
10
45 12
4
Apr 1 1965
Jun 1 2011
Jun 1 1965
46
7
47 12
5
May 1 1965
Aug 1 2010
Aug 1 1965
45
10
46 12
6
Oct 1 1965
May 1 2010
May 1 1966
44
11
45 12
Note that again, Lives 1 and 2 are not observed until the start of the investigation on January 1
2010.
Notice that by using insuring ages last birthday, the birthday is always moved forwards in time,
so that lives becomes younger than they really are. Also, lives whose policy purchase occurs
within the period of the investigation will now be observed for the first time at an integer age.
6
Supplementary note
This now gives us the following table of exposures:
Life
Age at first
observation
Age at last
observation
e44
e45
e46
e47
1
6
44 12
6
47 12
6
12
12
6
2
2
44 12
4
45 12
10
4
-
-
3
44
10
45 12
12
10
-
-
4
46
7
47 12
-
-
12
7
5
45
10
46 12
-
12
10
-
6
44
11
45 12
12
11
-
-
The total contribution of each life (ie the total of the exposures in each row) is the same as before,
but the distribution is different. We now have column totals of 40, 49, 34 and 13. Using the exact
exposure method, we find that:
hˆ 45 
1
 0.24490 and:
49 /12
hˆ 46 
1
 0.35294
34 /12
We can calculate q -type rates from these as before.

Anniversary-based studies
In the study outlined above, we had a three-year period of investigation, which ran from
January 1 2010 to December 31 2012.
An alternative approach (which can simplify the numbers obtained) is to use an anniversarybased study. In a study of this type, each life enters the investigation on the first policy
anniversary during the period of the investigation. Lives will also exit on the last policy
anniversary within the period of the investigation, if they are still active lives at this point. The
amount of exposure is reduced (which reduces the amount of information we are using), but the
numbers may be simplified, particularly if we use this method in conjunction with insuring ages.
Let’s see how we can apply this method to the example data given earlier.
7
Supplementary note
Example 1.2
Using the data for the 6 lives given above, calculate the exposures that would be obtained in an
anniversary-based study, using insuring ages last birthday.
Solution
Although the overall period of the investigation is from January 1 2010 to December 31 2012, each
life will enter the investigation on the policy anniversary following January 1 2010, and will leave
on the policy anniversary preceding December 31 2012, if they are still active at this point. So, for
example, Life 1 enters the investigation on July 1 2010, at which point the life has insuring age 45.
We obtain the following new table of values.
Life
Date of
birth
Date of
purchase
Date of
entry
Insuring
age at
entry
Date of exit
Insuring age
at exit
1
Mar 1 1965
Jul 1 2009
Jul 1 2010
45
Jul 1 2012
47
2
Jul 1 1965
Nov 1 2009
Nov 1 2010
45
Mar 1 2011
4
45 12
3
Aug 1 1965
Apr 1 2010
Apr 1 2010
44
Feb 1 2012
10
45 12
4
Apr 1 1965
Jun 1 2011
Jun 1 2011
46
Jun 1 2012
47
5
May 1 1965
Aug 1 2010
Aug 1 2010
45
Jun 1 2012
10
46 12
6
Oct 1 1965
May 1 2010
May 1 2010
44
Apr 1 2012
11
45 12
So the exposures are now as follows.
Life
Age at first
observation
Age at last
observation
e44
e45
e46
e47
1
45
47
-
12
12
-
2
45
4
45 12
-
4
-
-
3
44
10
45 12
12
10
-
-
4
46
47
-
-
12
-
5
45
10
46 12
-
12
10
-
6
44
11
45 12
12
11
-
-
The total exposures at each age are now 24, 49, 34 and zero (working in months as before).
8

Supplementary note
Note that in an investigation of this type, all lives who are active at the end of the investigation
will contribute a whole number of years to the exposures. Only lives who die or surrender will
contribute at fractional ages. In a large investigation, it may be that most of the lives are active
lives. So the amount of calculation needed may be reduced significantly using this method.
Interval-based methods
An alternative approach is not to record the exact time or age at which an event takes place, but
just to record the number of events of each type in each year of age. If we do this we will lose
some accuracy, but the calculations will be simplified. In a large actuarial study, provided that
there are many lives contributing to each age group, the loss of accuracy is likely to be small.
We will need to record the number of lives in the investigation at the start of each year of age,
together with the numbers entering, dying and leaving during the course of the year. We can
then use a table of these values to estimate the exposure within each particular age group.
Let’s see how we might apply these ideas to the group of six lives studied earlier.
Example 1.3
Using the data for the six lives given previously, construct a table of the numbers of decrements
in each year of age, and calculate the exact exposure for each of the relevant age groups.
Solution
Using exact ages, we have previously constructed the following table of data:
Life
Age at first observation
Age at last observation
Mode of exit
1
10
44 12
10
47 12
Withdrawal
2
6
44 12
8
45 12
Death
3
8
44 12
6
46 12
Withdrawal
4
2
46 12
9
47 12
Withdrawal
5
3
45 12
1
47 12
Withdrawal
6
7
44 12
6
46 12
Death
Withdrawal here includes both lives who surrendered, and lives who were active at the end of
the investigation period.
9
Supplementary note
We can see that:
(a)
Four lives entered at age 44 last birthday, one at 45 last birthday and one at 46 last
birthday.
(b)
One death occurred at age 45 last birthday, and one at age 46 last birthday.
(c)
Of the survivors, one exited at age 46 last birthday, and three at age 47 last birthday.
This leads to the following table of decrements:
Age
Population at
start of year
Number
entering
during the
year
Number dying
Number
leaving
during the
year
Population at
year end
44
0
4
0
0
4
45
4
1
1
0
4
46
4
1
1
1
3
47
3
0
0
3
0
We can now calculate estimates of the exact exposure, and the actuarial exposure. We are
assuming that we now do not have the exact information about entrances and exits, but only
have information in the form of the table above. We will now have to approximate the exposure.
The numbers in the population at the start of the year will contribute a full year to the exposure.
The numbers entering, dying and leaving are assumed to be distributed uniformly over the year.
This leads to the following formula for the exposure:


e j  Pj  n j  d j  w j / 2
where Pj is the population at the start of the year, n j is the number of lives entering during the
year, d j is the number of deaths during the year and w j is the number leaving the population
during the year.
If we apply this to the figures in the table above, we obtain estimates of the exact exposure at age
44 of:
e44  P44  (n44  d44  w44 ) / 2  0  (4  0  0) / 2  2
Similarly, if we apply the formula at the other ages, we obtain exposures of 4, 3.5 and 1.5
respectively. We can then calculate estimates of the hazard rate at each age:
1
hˆ 45   0.25
4
and:
1
hˆ 46 
 0.2857
3.5
Of course, given the small sample of lives, these estimates are different from the ones we
obtained earlier. However, using a large data sample, the loss of accuracy may not be great.
10
Supplementary note
If we wish to use the actuarial method, the deaths count a full year in the exposure. We therefore
do not need to deduct half the number of deaths in the exposure formula, which now becomes:


e j  Pj  n j  w j / 2
We now have figures for the exposure in each year of 2, 4.5, 4 and 1.5. So, for example, our
estimate for q 45 using the actuarial method now becomes:
qˆ 45 
1
 0.2222
4.5
Variance of the estimators
We have seen that hˆ  d / e can be used as an estimate for the hazard rate h , and that, using the
actuarial approach to finding e , that qˆ  d / e can be used as an estimate for the mortality rate q .
Note that e is calculated differently in the two cases.
It can also be shown that, under certain assumptions, these estimates are actually maximum
likelihood estimates for h and q . We shall not prove this here. However, if we make the
assumption that h is constant over the period during which we are observing the lives, then
these estimates are maximum likelihood estimates.
Here is a formal statement of this result.
MLEs for h and q
Suppose that a group of lives is observed from age a to age b , b  a . Assuming that the hazard
rate h is constant over the interval, then:
hˆ  d / e
and:
qˆ  1  e  d /e
are maximum likelihood estimates for h and q . Here, e is the exact exposure for the group of
lives over the age interval.
We can show these results in the usual way, by constructing the likelihood function, taking logs,
differentiating it with respect to h , setting the result equal to zero and solving the resulting
equation.
In fact we can go further than this. Recall from Chapter 5 that the Cramér-Rao lower bound can
be found for the variance of an estimator. In this case we can use the CRLB to find the variance of
the estimator for the hazard rate – it turns out to be var hˆ  d / e 2 . With this result, and using

the delta method from Chapter 5, we can also find the variance for the estimator for q , which
2
turns out to be var  qˆ    1  qˆ  d / e 2 .
We are assuming here that q represents the probability of death in a single time period. In the
more general case, where q is the estimate of the probability of death over a longer period than
one year, we have the corresponding result that:
var  qˆ    1  qˆ 
2
 b  a 2 d / e 2
where q is now the probability that a life dies between age a and age b .
11
Supplementary note
1.4
Simulation
The following section should be read in conjunction with Chapter 12 of the textbook.
Simulation methods for normal and lognormal distributions
To simulate values from a normal distribution, the inversion method can be used as usual. The
procedure would be:
1
Simulate a value u1 from a U (0,1) distribution.
2
Use tables of the standard normal distribution to find z1 where ( z1 )  u1 .
3
z1 is now a simulated value from a N (0,1) distribution. To find a simulated value x1
from a general N (  ,  2 ) distribution, use the transformation x1    z1 .
4
To find a simulated value from a lognormal distribution with parameters  and  2 , use
the transformation x1  e   z1 .
5
Repeat the process to obtain as many simulated values as are required.
However, there are a number of other methods that can be used to simulate values from normal
distributions. We give two methods here.
The Box-Muller method
An alternative approach is to use the Box-Muller method. This uses pairs of independent U (0,1)
simulated values to obtain pairs of independent standard normal values. The procedure is as
follows.
1
Generate 2 independent U (0,1) random numbers, u1 and u2 .
2
Then:
z1  2 log  u1  cos  2 u2 
and:
z2  2 log  u1  sin  2 u2 
are independent values from an N (0,1) distribution.
Note that you should set your calculator to ensure that the trigonometric functions are calculated
in radian mode.
12
Supplementary note
The polar method
The polar method also starts by calculating two independent values from U (0,1) . The method is
as follows:
1
Generate two independent U (0,1) numbers, u1 and u2 .
2
Calculate x1  2 u1  1 and x2  2u2  1 .
3
Calculate the value of w  x12  x22 . If w  1 , reject the process and start again.
4
Calculate y 
5
Calculate z1  x1 y and z2  x2 y .
variables.
 2 log w  / w
z1 and z2 are the required independent N (0,1)
Let’s see how to simulate values from a normal distribution using each of the methods given
above.
Example 1.4
Use each of the three methods given above (including the inversion method) and the random
numbers u1  0.273 and u2  0.518 to generate values from a normal distribution with mean 100
and standard deviation 20.
Solution
First we use the inversion method. We need to find the values z1 , z2 from the standard normal
distribution, such that   z1   0.273 and   z2   0.518 . Since the first random number is less
than 0.5, we will use the equivalent result    z1   1  0.273  0.727 . From the tables of the
standard normal distribution, we find that z1  0.604 . Similarly, we find that z2  0.045 . To
find values from a normal distribution with the given mean and standard deviation, we use the
relationships x1  100  20 z1  87.92 and x2  100  20 z2  100.90 . These are our two simulated
values from a N (100, 20 2 ) distribution.
Using the Box-Muller method, we obtain the values:
z1  2 log  0.273  cos  2    0.518   1.60109
and:
z2  2 log  0.273  sin  2    0.518   0.18186
Multiplying by 20 and adding 100, we obtain simulated values of 67.98 and 96.36.
We need to be careful here about the order in which we use the random numbers. If we switch
around u1 and u2 , we will of course end up with different simulated normal values.

13
Supplementary note
Finally, using the polar method, we have x1  0.454 and x2  0.036 . So w  0.207412 , and we
can use these values in the process since w  1 . Using the formula given above for y , we find
that y  3.89466 , and our standard normal values are 1.76817 and 0.14021 . Multiplying by 20
and adding 100 as before, we obtain the numbers 64.64 and 102.80.

Simulation of a discrete mixture
Consider the distribution whose distribution function is given by:





F( x )  0.4 1  e 0.03 x  0.3 1  e 0.02 x  0.3 1  e 0.05 x

This random variable is a discrete mixture of three exponential distributions.
Inverting this distribution function as it stands will not be very easy. However, an alternative
approach to simulating values from this type of distribution is as follows:
1
Use a random number to determine which individual exponential distribution to
simulate.
2
Use another random number to simulate a value from the correct exponential
distribution.
Here is an example.
Example 1.5
Use the random numbers 0.28, 0.57, 0.81 and 0.73 to simulate two values from the distribution
whose CDF is given above.
Solution
We subdivide the interval
 0,1 
into three sub-intervals,
 0,0.4  ,  0.4,0.7 
and
 0.7,1 .
Observing which of these sub-intervals contains our first random number will determine which
exponential distribution we use in the simulation.
Here our first random number is 0.28. Since this falls into the first sub-interval, we simulate from
an exponential distribution with parameter 0.03. Using the second random number in the
inversion process:
0.57  1  e 0.03 x1  x1  
1
log  1  0.57   28.13
0.03
Repeating the process, our next random number 0.81 falls into the third sub-interval, so we
simulate from an exponential distribution with parameter 0.05, using the fourth random number:
0.73  1  e 0.05x2  x2  
1
log  1  0.73   26.19
0.05
In this way we avoid having to invert the rather complicated expression for the CDF of the
mixture distribution.

Simulation using a stochastic process
We have already seen methods for simulating values from an ( a , b ,0) distribution, using the
inversion method. However, this method is not always very efficient. In this section we look at
an alternative approach to simulating values from a Poisson, binomial or negative binomial
distribution.
14
Supplementary note
Rather than trying to simulate directly the number of observations from the distribution, we will
consider the underlying process in time. If, for example, we want to simulate the number of
claims in one year, and we know that the claim distribution is Poisson with mean 3.4 per year, we
can simulate values from a Poisson distribution with mean 3.4. However, an alternative
approach would be to simulate the times at which these Poisson events occur, total these times,
and see how many occur before time one (year). This may seem like a longer process, but it can
in some situations be more efficient to program on a computer.
This method can be used for any of the three discrete distributions mentioned above. It can be
shown that the time to the next event always has an exponential distribution. However, we need
to be careful to use the correct exponential parameter, depending on the form of the distribution
we are trying to simulate. If the events we are trying to simulate occur according to a Poisson
process, then the time to the next event is exponential with the (constant) Poisson parameter. If
the events we are trying to simulate are binomial, then it can be shown that the time to the next
event is still exponential, but with a parameter that varies as the events occur. Similarly, if the
events we are trying to simulate are negative binomial, then the time to the next event has an
exponential distribution, but again the underlying parameter varies as the events occur.
Here are the key results that we will need for each of the three distributions.
Simulating a Poisson distribution
Time to the next event:
The time to the next event if events have a Poisson distribution
with parameter  is exponential with parameter  (and mean
1 /  ).
Exponential distribution:
We simulate the time to the next event as an exponential
random variable using sk   log  1  uk  /  .
Simulated value:
We can now determine the number of events happening in one
time unit, by summing up the sk ’s. The total time is
tk  t k 1  sk . The number of events occurring before time 1 is
our simulated value.
Simulating a binomial distribution
Time to the next event:
The time between events, if events have a binomial distribution
with parameters m and q , is exponential with parameter k .
The value of k is given by k  c  dk , where d  log  1  q 
and c  md .
Exponential distribution:
We simulate the time between events as an exponential random
variable using sk   log  1  uk  / k .
Simulated value:
We can now determine the number of events happening in one
time unit, by summing up the sk ’s using tk  t k 1  sk . The
number of events occurring before time 1 is our simulated
value.
15
Supplementary note
Simulating a negative binomial distribution
Time to the next event:
The time between events, if events have a negative binomial
distribution with parameters r and  , is exponential with
parameter k . The value of k is given by k  c  dk , where
d  log  1    and c  rd .
Exponential distribution:
We simulate the time between events as an exponential random
variable using sk   log  1  uk  / k .
Simulated value:
We can now determine the number of events happening in one
time unit, by summing up the sk ’s using tk  t k 1  sk . The
number of events occurring before time 1 is our simulated
value.
Note the convention in use here. We shall use our first random number, u0 , to determine 0 , the
exponential parameter that we shall use to simulate the time from time zero to the first event,
s0  t0 . Then u1 will be used to determine 1 , the exponential parameter of the distribution of
the time from the first to the second event, s1 , and now t1  s0  s1 is the total time to the second
event. t2 will be the total time until the third event, and so on.
Let’s see how this process works in practice using an example.
Example 1.6
Simulate values from each of the three distributions given below using as many of the following
random numbers as necessary:
u0  0.14
u1  0.28
u2  0.73
u3  0.82
u4  0.44
(a)
a Poisson distribution with mean 1.6.
(b)
a binomial distribution with parameters m  40 and q  0.04 .
(c)
a negative binomial distribution with parameters r  120 and   0.014 .
16
u5  0.61
Supplementary note
Solution
(a)
Poisson distribution
We have   1.6 . So, using our first random number, we have t0   log  1  0.14  /1.6  0.0943 .
So the time to our first event is 0.0943 time units. We now use the same formula but with the next
random number to find the time from the first to the second event:
s1   log  1  u1  /1.6  0.2053  t1  0.0943  0.2053  0.2996
The total time to the second event is 0.2996 time units.
Repeating the process again, we have:
s2   log  1  u2  /1.6  0.8183  t2  0.2996  0.8183  1.1179
So the third event occurs after the end of the time period, and two events have occurred within
the time interval  0,1  . The simulated value is 2.
Note that, with the notation above, t k is actually the total time to the k  1 th event.
(b)
Binomial distribution
We now need the values of c and d :
d  log  1  q   log 0.96  0.04082
and:
c  md  1.63288
We can now calculate the appropriate values of k :
0  c  1.63288
1  c  d  1.59206
2  c  2 d  1.55124
Now we can find the times of the various events:
t0  s0   log  1  u0  /1.63288  0.0924
s1   log  1  u1  /1.59206  0.2063  t1  0.0924  0.2063  0.2987
s2   log  1  u2  /1.55124  0.8441  t2  0.2987  0.8441  1.1428
So again, the third simulated event occurs after the end of the time interval, and our simulated
value is 2.
(c)
Negative binomial distribution
Again we need the values of c and d :
d  log  1     0.01390
and:
c  rd  1.66835
We can now calculate the appropriate values of k :
0  c  1.66835
1  c  d  1.68225
2  c  2 d  1.69615
17
Supplementary note
Now we can find the times of the various events:
t0  s0   log  1  u0  /1.66835  0.0904
s1   log  1  u1  /1.68225  0.1952  t1  0.0904  0.1953  0.2857
s2   log  1  u2  /1.69615  0.7719  t2  0.2857  0.7719  1.0576
So again, the third simulated event occurs after the end of the time interval, and our simulated
value is again 2.

Simulation from a decrement table
When following the progress of a group of policyholders, it may be necessary to simulate the
outcomes for the group. The group may be subject to a variety of different decrements, for
example death, retirement, withdrawal and so on.
Consider a group of 1,000 identical policyholders, all aged 60 exact. Let us assume that they are
subject to three decrements, death, age retirement and ill-health retirement. The probabilities for
each of these decrements at each age might be as follows:
Age
Probability of death
Probability of age
retirement
Probability of illhealth retirement
60
0.04
0.12
0.09
61
0.05
0.15
0.10
We want to simulate the progress of this group of policyholders, identifying the numbers of lives
who will leave the group via each decrement at each age. To do this, we will need to simulate
values from various binomial distributions. We might proceed as follows.
Consider first the number of deaths at age 60. This has a binomial distribution with parameters
1,000 and 0.04. So we first simulate a value from this binomial distribution to determine the
number of deaths during the year. Suppose that our simulated value is 28.
We now have a sample of 1,000  28  972 lives remaining. To determine the simulated number
of age retirements during the year, we now need a value from a binomial distribution with
parameter 972. However, we need the conditional probability of age retirement, given that a life
0.12
 0.125 . We can simulate a value from the binomial
is still alive. This will be
1  0.04
distribution with these parameters using any of the methods given previously for the binomial
distribution. Suppose our simulated value is 102.
We now have 972  102  870 lives remaining. To simulate the number of ill-health retirements,
we need the conditional probability of taking ill health retirement, given that a life has not died
or taken age retirement. This is:
0.09
 0.10714
1  0.04  0.12
18
Supplementary note
We now need a simulated value from a binomial distribution with parameters 870 and 0.10714.
Perhaps our simulated value is 62. We now have 870  62  808 lives surviving in the population
until age 61.
We can continue the process for as long as necessary, simulating the observed numbers of lives
exiting by each decrement at each age. We may need to carry out the process on a computer if we
want a large number of repeated simulations.
But the underlying method is fairly
straightforward.
19
Supplementary note
Supplementary Note Practice Questions
Question 1.1
Use the random number u  0.845 to generate a random number from a negative binomial
distribution with mean 0.264 and variance 0.3.
Question 1.2
Using the inversion method, use the random number u1  0.42 to generate a single observation
from a lognormal distribution with mean 5,000 and standard deviation 400.
Question 1.3
Use the random numbers 0.81, 0.95, 0.09, 0.22 and the polar method to generate two random
numbers from the standard normal distribution.
Question 1.4
Use the random numbers u1  0.73 and u2  0.28 and the Box-Muller method to generate two
random numbers from a normal distribution with mean 100 and standard deviation 10.
Question 1.5
Use a stochastic process to generate a random observation from a binomial distribution with
parameters m  50 and q  0.01 . Use as many of these random numbers as are needed:
u0  0.423
u1  0.796
u2  0.522
u3  0.637
u4  0.992
Question 1.6
Use a stochastic process to generate values from a negative binomial distribution with
parameters r  100 and   0.08 . Use the same random numbers as in the previous question.
Question 1.7
Use the first two random numbers from the previous question to generate a random observation
from the mixture of Pareto distributions with distribution function:
  200 3 
  300  4 

1

F( x )  0.6  1  

0.4
  x  200  
  x  300  




20
Supplementary note
You are given the following information about a sample of lives:
Life
Date of birth
Date of purchase
Mode of exit
Date of exit
1
Apr 15 1950
Jan 1 2011
Died
May 15 2011
2
Jul 15 1950
Apr 1 2011
Surrendered
Mar 15 2012
3
Oct 15 1950
Oct 1 2011
Alive
-
4
Jan 15 1950
Feb 1 2011
Alive
-
5
Feb 15 1951
Mar 1 2011
Died
Aug 15 2011
These lives are subject to a 2-year investigation, running from July 1 2010 to June 30 2012.
Assume in each of the following questions that each half-month period is exactly one twentyfourth of a year.
Question 1.8
Using the exact exposure method, estimate h61 .
Question 1.9
Using the actuarial exposure method, estimate q60 and h60 .
Question 1.10
Using the exact exposure method with insuring ages last birthday, estimate q60 .
Question 1.11
Using the actuarial exposure method with insuring ages last birthday, estimate q60 .
Question 1.12
Explain how your answer to Question 1.10 would alter if you were using an anniversary-based
study to estimate q60 and q61 .
Question 1.13
Using an interval-based method and the table of lives given above, construct a table of
decrements, and hence estimate q60 using the actuarial method.
21
Supplementary note
Question 1.14
Find the estimated variance of your estimator in the previous question.
22
Supplementary note
Solutions to Supplementary Note Practice Questions
Question 1.1
We first need the parameters of the negative binomial distribution. Using the formulae for the
mean and variance:
r   0.264
and:
r   1     0.3
Solving these simultaneous equations, we obtain r  1.936 and   0.13636 .
The question does not require us to use a stochastic process, so it is probably quickest just to use
the inversion method as normal. Calculating the first few negative binomial probabilities:
p0   1   
p1 
r
 0.780762
r
 1   r  1
 0.18139
So the inversion method will transform random numbers in the interval (0, p0 ) to a simulated
value of zero, and random numbers in the range ( p0 , p0  p1 ) to a simulated value of 1. Our
random number lies in this second interval, so our simulated value is 1.
Question 1.2
First we need the parameters of the lognormal distribution. Using the formulae for the mean and
variance of the lognormal, we have:
2
e   ½  5,000
and:
2
2
e 2    e  1   400 2


Solving these simultaneous equations, we find that   8.514003 and  2  0.0063796 .
We now find a simulated N (0,1) value by using the normal tables:
  z1   0.42     z1   0.58  z1  0.2019
We can now find a simulated value from the lognormal distribution:
x1  e   z1  e8.514 0.2019
0.0063796
 4,904
23
Supplementary note
Question 1.3
First we find x1  2u1  1  0.62 and x2  2u2  1  0.90 .
w  x12
Applying the check, we find that
 x22
 1.19 . Since w  1 , we reject these values and start the process again using the other
random numbers.
2
2
Now we have x3  2u3  1  0.82 and x4  2u4  1  0.56 . Since  0.82    0.56   0.986  1 ,
we can proceed. So:
y
 2 log 0.986  /0.986  0.16911
and we have:
z1  x3 y  0.82  0.16911  0.1387
and:
z2  x 4 y  0.56  0.16911  0.0947
These are our simulated values from the standard normal distribution.
Question 1.4
Using the standard Box-Muller formula, we have:
z1  2 log u1 cos  2 u2   2 log  0.73  cos  2  0.28   0.148661
and:
z2  2 log u1 sin  2 u2   2 log  0.73  sin  2  0.28   0.779308
These are independent N (0,1) observations. The corresponding values from the normal
distribution with mean 100 and standard deviation 10 are:
x1  100  10 z1  98.51
and:
x2  100  10 z2  107.79
Question 1.5
We first need the parameters c and d :
d  log  1  q   log 0.99  0.010050
and:
c  md  0.502517
We now use the formula k  c  dk to generate the successive exponential parameters:
s0   log  1  u0  / 0 
 log 0.577
 1.0943
0.502517
Since this value is greater than one, the first event occurs after time one, and so there are no
observed events in a unit time period. The simulated value from the distribution is zero.
24
Supplementary note
Question 1.6
First we need the values of the parameters c and d :
d  log  1     0.076961
and:
c  rd  7.696104
We use the formula k  c  dk to generate the successive exponential parameters. So the times to
the events are:
t0   log  1  0.423  /7.696104  0.071453
s1   log  1  0.796  /7.773065  0.204506
s2   log  1  0.522  /7.850026  0.094031
s3   log  1  0.637  /7.926987  0.127836
s4   log  1  0.992  /8.003948  0.603242
We see that t4  t0  s1  s2  s3  s4 is the first value of t which is greater than one. So the fifth
event occurs after time 1, and there are 4 events in the unit time period. The simulated value is 4.
Question 1.7
Since u0  0.423 and 0  0.423  0.6 , we use the first of the two Pareto distributions to simulate.
Using our second random number:
3
 200 
0.796  1  
  x  139.75
 x  200 
25
Supplementary note
Question 1.8
We start by calculating the age at which each life was first observed, and last observed. Treating
1 th of a year, we obtain the following table of ages and
a half month as being equal to 24
exposures (the exposure unit is also one twenty-fourth of a year):
Life
Age at first
observation
Age at last
observation
e59
e60
e61
e62
1
60 17
24
2
61 24
-
7
2
-
2
60 17
24
61 16
24
-
7
16
-
3
23
60 24
61 17
24
-
1
17
-
4
1
61 24
11
62 24
-
-
23
11
5
1
60 24
60 12
24
-
11
-
-
The total exposure at age 61 is 58 twenty-fourths of a year. We have one death at age 61 last
2 ). So our estimate is:
birthday (Life 1 dies at age 61 24
hˆ61 
1
 0.41379
58 / 24
Question 1.9
Using the actuarial exposure method, we need to allow for extra exposure for the deaths. We are
now looking at age 60, and Life 5 dies aged 60 12
. So there is extra exposure of 12
ths of a year
24
24
to age 61), and the total exposure at age 60 goes up by 12, from 26 to 38 (twenty(from age 60 12
24
fourths of a year). So we now have:
qˆ 60 
1
 0.63158
38 / 24
To find the estimate for the hazard rate, we note that q  1  e  h , ie that h   log  1  q  . So we
have:
hˆ60   log  1  0.63158   0.99853
26
Supplementary note
Question 1.10
We now want to use insuring ages last birthday. We have the following new table of dates:
Life
Date of
birth
Date of
purchase
New date
of birth
Insuring
age at
entry
Date of exit
Insuring
age at exit
1
Apr 15 1950
Jan 1 2011
Jan 1 1951
60
May 15 2011
9
60 24
2
Jul 15 1950
Apr 1 2011
Apr 1 1951
60
Mar 15 2012
23
60 24
3
Oct 15 1950
Oct 1 2011
Oct 1 1951
60
Jun 30 2012
60 18
24
4
Jan 15 1950
Feb 1 2011
Feb 1 1950
61
Jun 30 2012
62 10
24
5
Feb 15 1951
Mar 1 2011
Mar 1 1951
60
Aug 15 2011
11
60 24
9 ,
We now check the ages at death. Using insuring ages, we find that Life 1 now dies at age 60 24
11 . So we now have two deaths at age 60 last birthday.
and Life 5 dies at age 60 24
The contributions to the exposures at each age are as follows (in units of one twenty-fourth of a
year):
Life
Age at first
observation
Age at last
observation
e60
e61
e62
1
60
9
60 24
9
-
-
2
60
23
60 24
23
-
-
3
60
60 18
24
18
-
-
4
61
62 10
24
-
24
10
5
60
11
60 24
11
-
-
So the exposure at age 60 is now 61 twenty-fourths of a year, and there are two deaths. So using
the exact exposure method, the estimate of the hazard rate at age 60 is:
hˆ60 
2
 0.78689
61 / 24
And so:
qˆ 60  1  e 0.78689  0.54474
27
Supplementary note
Question 1.11
Using the actuarial exposure method, we increase the exposure for lives 1 and 5 to a whole year.
So we now have:
e60  24  23  18  24  89
and the estimate for q60 is now:
qˆ 60 
2
 0.53933
89 / 24
Question 1.12
The figures would be different for the two lives who remain until the end of the investigation.
Life 3 enters the investigation on 1 Oct 2011. At this point there is less than a full year until the
end of the investigation. So life 3 cannot contribute even one year to the exposure, and so would
not contribute at all. Life 4 enters on 1 Feb 2011, so can contribute for a full year from 1 Feb 2011
to 1 Feb 2012. So the contributions to the exposure of these two lives will be zero for Life 3, and
one full year of exposure in e61 only for Life 4.
Question 1.13
We obtain the following figures:
Age
Pj
nj
dj
wj
Pj  1
60
0
4
1
0
3
61
3
1
1
2
1
62
1
0
0
1
0
We can now calculate the exposure at age 60 using the actuarial method:
e60  P60   n60  w60  / 2  0   4  0  / 2  2
So the exposure is 2 years, and our estimate is qˆ 60  0.5 .
Question 1.14
Using the formula given in the text, we have:
2
var  qˆ 60    1  qˆ 60  d / e 2  0.52  1 / 2 2  0.0625
28
Supplementary note
Appendix – Syllabus changes
In 2013 the Society of Actuaries added a small number of syllabus items to the examination
syllabus for Exam C. The new syllabus items are listed here, together with details of the material
which covers them.
A8
Identify and describe two extreme value distributions.
A very brief introduction to the study of extreme value distributions is given in Section 1.2 of this
study note.
G
Estimation of decrement probabilities from large samples
1
Estimate decrement probabilities using both parametric and non-parametric approaches
for both individual and interval data
2
Approximate the variance of the estimators.
Some methods for dealing with large samples are covered in Chapter 7 of the BPP textbook.
However, some additional ideas are covered in Section 1.3 of this study note, which gives an
alternative approach to these ideas.
J2
Simulate from discrete mixtures, decrement tables, the ( a , b ,0) class, and the normal and
lognormal distributions using methods designed for those distributions
The basic simulation ideas are covered in Chapter 12 of the BPP textbook. A small number of
additional methods, which have been added to the syllabus, are covered in Section 1.4 of this
study note.
The SoA has now added 10 additional questions to the end of the Exam C Sample Questions
document (these are currently Questions 290-299). You can find this by searching on the web for
“Be an actuary Exam C Syllabus” and clicking on the link to the syllabus – the questions and
solutions links are at the end of the syllabus document. You should test your understanding of
the material in this note by completing these additional questions. They all relate to the material
in this study note.
29
Download