Supplementary notes for Exam C Overview 1.1 Introduction In July and August 2013, the SoA added a number of questions to the sample exam questions document for Exam C on the Be-an-Actuary website. These were to cover syllabus items recently added to Exam C. The attached note covers the additional material needed for these syllabus items. There are three sections in this note. The first looks at the idea of an extreme value distribution. The second describes an alternative approach to dealing with large data sets. The final section introduces a number of additional simulation techniques in various situations. As you read this material, you should have in mind as far as possible the material in Chapter 2 for the first section, that in Chapter 7 for the second section, and that in Chapter 12 for the final section. We have given the syllabus items themselves in an appendix to this note. 1 Supplementary note 1.2 Extreme value distributions The following section should be read in conjunction with Chapter 2 of the textbook. There are some areas of insurance work where it is useful to model quantities using distributions with particularly heavy tails. One example of a situation like this would be when constructing a model for the largest value in a set of identical and independently distributed random variables. If we are trying to model the maximum value from a random sample, intuitively this maximum value is likely to be in some sense large. We may therefore want to model it with a distribution which is heavy-tailed. Distributions of this type are known as extreme value distributions. In the situation outlined above, the inverse Weibull distribution is often use as a model. This is related to the Weibull distribution studied earlier as follows. Inverse Weibull distribution (Fréchet distribution) If a random variable X has a Weibull distribution, then the random variable Y 1 / X is said to have an inverse Weibull distribution. The inverse Weibull distribution has the following attributes: x / ( x / ) e PDF: f (x) CDF: F x e ( x / ) Moments: E X k k 1 k / x This distribution is sometimes known as the Fréchet distribution. More details for the Inverse Weibull distribution are given in the Tables for Exam C. The process of finding the distribution for the random variable Y 1 / X can be applied to other distributions. Examples of the inverse exponential distribution and the inverse gamma distribution are given in the Tables for Exam C. Other distributions which are sometimes used in this context are the Gumbel distribution, and the Weibull distribution itself (without inversion). Details for the Gumbel distribution are given below. Gumbel distribution A random variable X is said to have a Gumbel distribution if: f (x) 1 y e exp e y where y x , and x The distribution function is: F( x ) exp e y The details for the Weibull distribution itself are given in Chapter 2 of the textbook. The Pareto distribution also has a thick tail, and can sometimes be used in these situations. 2 Supplementary note 1.3 Large data sets – an alternative approach The following section should be read in conjunction with Chapter 7 of the textbook. We have seen in Chapter 7 how the Kaplan-Meier method can be adapted for use with large data sets. Here we look at another approach for calculating mortality rates when large numbers of lives are involved. To help to describe the main features of the method, we shall use a small sample of six lives to demonstrate the approach. The exact exposure method A company is trying to estimate mortality rates for the holders of a certain type of policy. It has the following information about a group of 6 lives, who all hold a policy of this type. The investigation ran for a three-year period, from Jan 1 2010 to Dec 31 2012. Life Date of birth Date of purchase Mode of exit Date of exit 1 Mar 1 1965 Jul 1 2009 Alive Dec 31 2012 2 Jul 1 1965 Nov 1 2009 Death Mar 1 2011 3 Aug 1 1965 Apr 1 2010 Surrender Feb 1 2012 4 Apr 1 1965 Jun 1 2011 Alive Dec 31 2012 5 May 1 1965 Aug 1 2010 Surrender Jun 1 2012 6 Oct 1 1965 May 1 2010 Death Apr 1 2012 We see that of the 6 lives, two survived within the population to the end of the investigation, two of the lives surrendered their policies while the investigation was in progress, and two died during the period of the investigation. We wish to use the information in the table above to estimate mortality rates at various ages. We shall assume that each month is exactly one-twelfth of a year, to simplify the calculations. We start by finding the ages at which each life started to be observed, and the age at which life ceased to be observed. Note that although Life 1 purchased his policy on July 1 2009, the investigation had not started at that point. So the date on which Life 1 is first observed is January 1 2010. Life 2 is also first observed on this date. 3 Supplementary note This gives us the following table of ages: Life Age at first observation Age at last observation 1 10 44 12 10 47 12 2 6 44 12 8 45 12 3 8 44 12 6 46 12 4 2 46 12 9 47 12 5 3 45 12 1 47 12 6 7 44 12 6 46 12 In order to estimate the mortality rates, we need to find out the length of time for which each life was alive, and for which they were a member of the investigation. We need to subdivide these periods by age last birthday. So, for example, we shall use e44 for the period of time during which a life (or group of lives) was aged 44 last birthday, e45 for the corresponding period of time for which lives were aged 45 last birthday, and so on. From the table above, we can now find the contribution of each life to each of e44 , e45 , e46 and e47 . This gives us the following table of figures (in months): Life Age at first observation Age at last observation e44 e45 e46 e47 1 10 44 12 10 47 12 2 12 12 10 2 6 44 12 8 45 12 6 8 - - 3 8 44 12 6 46 12 4 12 6 - 4 2 46 12 9 47 12 - - 10 9 5 3 45 12 1 47 12 - 9 12 1 6 7 44 12 6 46 12 5 12 6 - This gives us totals in each of the ek columns of 17, 53, 46 and 20 respectively. 4 Supplementary note We can now use these to calculate estimates of the mortality rates. It can be shown that d j / e j provides us with the maximum likelihood estimate of the hazard rate at each age. Noting that Life 2 dies at age 45 last birthday, and that Life 6 dies aged 46 last birthday, we can find estimates of the hazard rates at these two ages: hˆ 45 1 0.22642 53 /12 hˆ 46 and: 1 0.26087 46 /12 Note that we do not have enough data to provide estimates of the hazard rates at any other age. Alternatively we could claim, without much conviction, that our estimates of the mortality rates at ages 44 and 47 were zero, based on this very small sample of data. If we wish to find the values of the corresponding q -type mortality rates, we use the relationship: ˆ qˆ x 1 e hx In this case we obtain corresponding q -type rates of 0.20261 and 0.22962 respectively. The method we have used here is called the exact exposure method. We have calculated the exact period of time for which a group of lives has been exposed to the risk of death for a particular age. The actuarial exposure method An alternative approach is to use what is called the actuarial exposure method. This provides us with a direct estimate of the q -type mortality rates, but it is perhaps not so intuitively appealing. We proceed as follows: (1) Calculate the contribution of each life to each of the e j figures, as above. (2) For each of the lives that die (and only for the deaths), add in the period of time from the date of death until the end of the year of age (ie the period of time until the life would have achieved its next birthday). This increases the contribution from the deaths to one (or sometimes two) of the e j figures. (3) The q -type rates are now given directly by d j / e j . If we apply this method to the data given above, we have the following alterations: (a) Life 2 now contributes 12 months to e45 . (b) Life 6 now contributes 12 months to e46 . All the other figures in the table are unchanged. The column totals are now 17, 57, 52 and 20 respectively. If we recalculate our mortality estimates, we find that qˆ 45 1 0.21053 57 /12 and: qˆ 46 1 0.23077 52 /12 5 Supplementary note Although we have used in these examples a sample of only 6 lives, you should be able to see that the method generalizes easily, and can cope with large sample data without any real increase in the level of difficulty of the calculation. The approaches outlined above are sometimes called seriatim methods. This refers to the fact that the data points are analyzed as a series of independent observations. Insuring ages A variation on this idea is to use the concept of insuring ages. In this case, an insurer will designate each policyholder to have their birthday on the date on which the policy was first taken out. So, for example, if a person is aged 45 last birthday when he takes out his policy, we treat him as if he is aged exactly 45 on the issue date. This means that some of the elements of the exposure will be assigned to younger ages than would be the case when using the policyholder’s true birthday. Example 1.1 Reanalyze the data given above for the six lives, using insuring ages by age last birthday. Recalculate the estimates of the hazard rates at ages 45 and 46. Solution We now have the following table. Life Date of birth Date of purchase New birthday Age at first observation Age at last observation 1 Mar 1 1965 Jul 1 2009 Jul 1 1965 6 44 12 6 47 12 2 Jul 1 1965 Nov 1 2009 Nov 1 1965 2 44 12 4 45 12 3 Aug 1 1965 Apr 1 2010 Apr 1 1966 44 10 45 12 4 Apr 1 1965 Jun 1 2011 Jun 1 1965 46 7 47 12 5 May 1 1965 Aug 1 2010 Aug 1 1965 45 10 46 12 6 Oct 1 1965 May 1 2010 May 1 1966 44 11 45 12 Note that again, Lives 1 and 2 are not observed until the start of the investigation on January 1 2010. Notice that by using insuring ages last birthday, the birthday is always moved forwards in time, so that lives becomes younger than they really are. Also, lives whose policy purchase occurs within the period of the investigation will now be observed for the first time at an integer age. 6 Supplementary note This now gives us the following table of exposures: Life Age at first observation Age at last observation e44 e45 e46 e47 1 6 44 12 6 47 12 6 12 12 6 2 2 44 12 4 45 12 10 4 - - 3 44 10 45 12 12 10 - - 4 46 7 47 12 - - 12 7 5 45 10 46 12 - 12 10 - 6 44 11 45 12 12 11 - - The total contribution of each life (ie the total of the exposures in each row) is the same as before, but the distribution is different. We now have column totals of 40, 49, 34 and 13. Using the exact exposure method, we find that: hˆ 45 1 0.24490 and: 49 /12 hˆ 46 1 0.35294 34 /12 We can calculate q -type rates from these as before. Anniversary-based studies In the study outlined above, we had a three-year period of investigation, which ran from January 1 2010 to December 31 2012. An alternative approach (which can simplify the numbers obtained) is to use an anniversarybased study. In a study of this type, each life enters the investigation on the first policy anniversary during the period of the investigation. Lives will also exit on the last policy anniversary within the period of the investigation, if they are still active lives at this point. The amount of exposure is reduced (which reduces the amount of information we are using), but the numbers may be simplified, particularly if we use this method in conjunction with insuring ages. Let’s see how we can apply this method to the example data given earlier. 7 Supplementary note Example 1.2 Using the data for the 6 lives given above, calculate the exposures that would be obtained in an anniversary-based study, using insuring ages last birthday. Solution Although the overall period of the investigation is from January 1 2010 to December 31 2012, each life will enter the investigation on the policy anniversary following January 1 2010, and will leave on the policy anniversary preceding December 31 2012, if they are still active at this point. So, for example, Life 1 enters the investigation on July 1 2010, at which point the life has insuring age 45. We obtain the following new table of values. Life Date of birth Date of purchase Date of entry Insuring age at entry Date of exit Insuring age at exit 1 Mar 1 1965 Jul 1 2009 Jul 1 2010 45 Jul 1 2012 47 2 Jul 1 1965 Nov 1 2009 Nov 1 2010 45 Mar 1 2011 4 45 12 3 Aug 1 1965 Apr 1 2010 Apr 1 2010 44 Feb 1 2012 10 45 12 4 Apr 1 1965 Jun 1 2011 Jun 1 2011 46 Jun 1 2012 47 5 May 1 1965 Aug 1 2010 Aug 1 2010 45 Jun 1 2012 10 46 12 6 Oct 1 1965 May 1 2010 May 1 2010 44 Apr 1 2012 11 45 12 So the exposures are now as follows. Life Age at first observation Age at last observation e44 e45 e46 e47 1 45 47 - 12 12 - 2 45 4 45 12 - 4 - - 3 44 10 45 12 12 10 - - 4 46 47 - - 12 - 5 45 10 46 12 - 12 10 - 6 44 11 45 12 12 11 - - The total exposures at each age are now 24, 49, 34 and zero (working in months as before). 8 Supplementary note Note that in an investigation of this type, all lives who are active at the end of the investigation will contribute a whole number of years to the exposures. Only lives who die or surrender will contribute at fractional ages. In a large investigation, it may be that most of the lives are active lives. So the amount of calculation needed may be reduced significantly using this method. Interval-based methods An alternative approach is not to record the exact time or age at which an event takes place, but just to record the number of events of each type in each year of age. If we do this we will lose some accuracy, but the calculations will be simplified. In a large actuarial study, provided that there are many lives contributing to each age group, the loss of accuracy is likely to be small. We will need to record the number of lives in the investigation at the start of each year of age, together with the numbers entering, dying and leaving during the course of the year. We can then use a table of these values to estimate the exposure within each particular age group. Let’s see how we might apply these ideas to the group of six lives studied earlier. Example 1.3 Using the data for the six lives given previously, construct a table of the numbers of decrements in each year of age, and calculate the exact exposure for each of the relevant age groups. Solution Using exact ages, we have previously constructed the following table of data: Life Age at first observation Age at last observation Mode of exit 1 10 44 12 10 47 12 Withdrawal 2 6 44 12 8 45 12 Death 3 8 44 12 6 46 12 Withdrawal 4 2 46 12 9 47 12 Withdrawal 5 3 45 12 1 47 12 Withdrawal 6 7 44 12 6 46 12 Death Withdrawal here includes both lives who surrendered, and lives who were active at the end of the investigation period. 9 Supplementary note We can see that: (a) Four lives entered at age 44 last birthday, one at 45 last birthday and one at 46 last birthday. (b) One death occurred at age 45 last birthday, and one at age 46 last birthday. (c) Of the survivors, one exited at age 46 last birthday, and three at age 47 last birthday. This leads to the following table of decrements: Age Population at start of year Number entering during the year Number dying Number leaving during the year Population at year end 44 0 4 0 0 4 45 4 1 1 0 4 46 4 1 1 1 3 47 3 0 0 3 0 We can now calculate estimates of the exact exposure, and the actuarial exposure. We are assuming that we now do not have the exact information about entrances and exits, but only have information in the form of the table above. We will now have to approximate the exposure. The numbers in the population at the start of the year will contribute a full year to the exposure. The numbers entering, dying and leaving are assumed to be distributed uniformly over the year. This leads to the following formula for the exposure: e j Pj n j d j w j / 2 where Pj is the population at the start of the year, n j is the number of lives entering during the year, d j is the number of deaths during the year and w j is the number leaving the population during the year. If we apply this to the figures in the table above, we obtain estimates of the exact exposure at age 44 of: e44 P44 (n44 d44 w44 ) / 2 0 (4 0 0) / 2 2 Similarly, if we apply the formula at the other ages, we obtain exposures of 4, 3.5 and 1.5 respectively. We can then calculate estimates of the hazard rate at each age: 1 hˆ 45 0.25 4 and: 1 hˆ 46 0.2857 3.5 Of course, given the small sample of lives, these estimates are different from the ones we obtained earlier. However, using a large data sample, the loss of accuracy may not be great. 10 Supplementary note If we wish to use the actuarial method, the deaths count a full year in the exposure. We therefore do not need to deduct half the number of deaths in the exposure formula, which now becomes: e j Pj n j w j / 2 We now have figures for the exposure in each year of 2, 4.5, 4 and 1.5. So, for example, our estimate for q 45 using the actuarial method now becomes: qˆ 45 1 0.2222 4.5 Variance of the estimators We have seen that hˆ d / e can be used as an estimate for the hazard rate h , and that, using the actuarial approach to finding e , that qˆ d / e can be used as an estimate for the mortality rate q . Note that e is calculated differently in the two cases. It can also be shown that, under certain assumptions, these estimates are actually maximum likelihood estimates for h and q . We shall not prove this here. However, if we make the assumption that h is constant over the period during which we are observing the lives, then these estimates are maximum likelihood estimates. Here is a formal statement of this result. MLEs for h and q Suppose that a group of lives is observed from age a to age b , b a . Assuming that the hazard rate h is constant over the interval, then: hˆ d / e and: qˆ 1 e d /e are maximum likelihood estimates for h and q . Here, e is the exact exposure for the group of lives over the age interval. We can show these results in the usual way, by constructing the likelihood function, taking logs, differentiating it with respect to h , setting the result equal to zero and solving the resulting equation. In fact we can go further than this. Recall from Chapter 5 that the Cramér-Rao lower bound can be found for the variance of an estimator. In this case we can use the CRLB to find the variance of the estimator for the hazard rate – it turns out to be var hˆ d / e 2 . With this result, and using the delta method from Chapter 5, we can also find the variance for the estimator for q , which 2 turns out to be var qˆ 1 qˆ d / e 2 . We are assuming here that q represents the probability of death in a single time period. In the more general case, where q is the estimate of the probability of death over a longer period than one year, we have the corresponding result that: var qˆ 1 qˆ 2 b a 2 d / e 2 where q is now the probability that a life dies between age a and age b . 11 Supplementary note 1.4 Simulation The following section should be read in conjunction with Chapter 12 of the textbook. Simulation methods for normal and lognormal distributions To simulate values from a normal distribution, the inversion method can be used as usual. The procedure would be: 1 Simulate a value u1 from a U (0,1) distribution. 2 Use tables of the standard normal distribution to find z1 where ( z1 ) u1 . 3 z1 is now a simulated value from a N (0,1) distribution. To find a simulated value x1 from a general N ( , 2 ) distribution, use the transformation x1 z1 . 4 To find a simulated value from a lognormal distribution with parameters and 2 , use the transformation x1 e z1 . 5 Repeat the process to obtain as many simulated values as are required. However, there are a number of other methods that can be used to simulate values from normal distributions. We give two methods here. The Box-Muller method An alternative approach is to use the Box-Muller method. This uses pairs of independent U (0,1) simulated values to obtain pairs of independent standard normal values. The procedure is as follows. 1 Generate 2 independent U (0,1) random numbers, u1 and u2 . 2 Then: z1 2 log u1 cos 2 u2 and: z2 2 log u1 sin 2 u2 are independent values from an N (0,1) distribution. Note that you should set your calculator to ensure that the trigonometric functions are calculated in radian mode. 12 Supplementary note The polar method The polar method also starts by calculating two independent values from U (0,1) . The method is as follows: 1 Generate two independent U (0,1) numbers, u1 and u2 . 2 Calculate x1 2 u1 1 and x2 2u2 1 . 3 Calculate the value of w x12 x22 . If w 1 , reject the process and start again. 4 Calculate y 5 Calculate z1 x1 y and z2 x2 y . variables. 2 log w / w z1 and z2 are the required independent N (0,1) Let’s see how to simulate values from a normal distribution using each of the methods given above. Example 1.4 Use each of the three methods given above (including the inversion method) and the random numbers u1 0.273 and u2 0.518 to generate values from a normal distribution with mean 100 and standard deviation 20. Solution First we use the inversion method. We need to find the values z1 , z2 from the standard normal distribution, such that z1 0.273 and z2 0.518 . Since the first random number is less than 0.5, we will use the equivalent result z1 1 0.273 0.727 . From the tables of the standard normal distribution, we find that z1 0.604 . Similarly, we find that z2 0.045 . To find values from a normal distribution with the given mean and standard deviation, we use the relationships x1 100 20 z1 87.92 and x2 100 20 z2 100.90 . These are our two simulated values from a N (100, 20 2 ) distribution. Using the Box-Muller method, we obtain the values: z1 2 log 0.273 cos 2 0.518 1.60109 and: z2 2 log 0.273 sin 2 0.518 0.18186 Multiplying by 20 and adding 100, we obtain simulated values of 67.98 and 96.36. We need to be careful here about the order in which we use the random numbers. If we switch around u1 and u2 , we will of course end up with different simulated normal values. 13 Supplementary note Finally, using the polar method, we have x1 0.454 and x2 0.036 . So w 0.207412 , and we can use these values in the process since w 1 . Using the formula given above for y , we find that y 3.89466 , and our standard normal values are 1.76817 and 0.14021 . Multiplying by 20 and adding 100 as before, we obtain the numbers 64.64 and 102.80. Simulation of a discrete mixture Consider the distribution whose distribution function is given by: F( x ) 0.4 1 e 0.03 x 0.3 1 e 0.02 x 0.3 1 e 0.05 x This random variable is a discrete mixture of three exponential distributions. Inverting this distribution function as it stands will not be very easy. However, an alternative approach to simulating values from this type of distribution is as follows: 1 Use a random number to determine which individual exponential distribution to simulate. 2 Use another random number to simulate a value from the correct exponential distribution. Here is an example. Example 1.5 Use the random numbers 0.28, 0.57, 0.81 and 0.73 to simulate two values from the distribution whose CDF is given above. Solution We subdivide the interval 0,1 into three sub-intervals, 0,0.4 , 0.4,0.7 and 0.7,1 . Observing which of these sub-intervals contains our first random number will determine which exponential distribution we use in the simulation. Here our first random number is 0.28. Since this falls into the first sub-interval, we simulate from an exponential distribution with parameter 0.03. Using the second random number in the inversion process: 0.57 1 e 0.03 x1 x1 1 log 1 0.57 28.13 0.03 Repeating the process, our next random number 0.81 falls into the third sub-interval, so we simulate from an exponential distribution with parameter 0.05, using the fourth random number: 0.73 1 e 0.05x2 x2 1 log 1 0.73 26.19 0.05 In this way we avoid having to invert the rather complicated expression for the CDF of the mixture distribution. Simulation using a stochastic process We have already seen methods for simulating values from an ( a , b ,0) distribution, using the inversion method. However, this method is not always very efficient. In this section we look at an alternative approach to simulating values from a Poisson, binomial or negative binomial distribution. 14 Supplementary note Rather than trying to simulate directly the number of observations from the distribution, we will consider the underlying process in time. If, for example, we want to simulate the number of claims in one year, and we know that the claim distribution is Poisson with mean 3.4 per year, we can simulate values from a Poisson distribution with mean 3.4. However, an alternative approach would be to simulate the times at which these Poisson events occur, total these times, and see how many occur before time one (year). This may seem like a longer process, but it can in some situations be more efficient to program on a computer. This method can be used for any of the three discrete distributions mentioned above. It can be shown that the time to the next event always has an exponential distribution. However, we need to be careful to use the correct exponential parameter, depending on the form of the distribution we are trying to simulate. If the events we are trying to simulate occur according to a Poisson process, then the time to the next event is exponential with the (constant) Poisson parameter. If the events we are trying to simulate are binomial, then it can be shown that the time to the next event is still exponential, but with a parameter that varies as the events occur. Similarly, if the events we are trying to simulate are negative binomial, then the time to the next event has an exponential distribution, but again the underlying parameter varies as the events occur. Here are the key results that we will need for each of the three distributions. Simulating a Poisson distribution Time to the next event: The time to the next event if events have a Poisson distribution with parameter is exponential with parameter (and mean 1 / ). Exponential distribution: We simulate the time to the next event as an exponential random variable using sk log 1 uk / . Simulated value: We can now determine the number of events happening in one time unit, by summing up the sk ’s. The total time is tk t k 1 sk . The number of events occurring before time 1 is our simulated value. Simulating a binomial distribution Time to the next event: The time between events, if events have a binomial distribution with parameters m and q , is exponential with parameter k . The value of k is given by k c dk , where d log 1 q and c md . Exponential distribution: We simulate the time between events as an exponential random variable using sk log 1 uk / k . Simulated value: We can now determine the number of events happening in one time unit, by summing up the sk ’s using tk t k 1 sk . The number of events occurring before time 1 is our simulated value. 15 Supplementary note Simulating a negative binomial distribution Time to the next event: The time between events, if events have a negative binomial distribution with parameters r and , is exponential with parameter k . The value of k is given by k c dk , where d log 1 and c rd . Exponential distribution: We simulate the time between events as an exponential random variable using sk log 1 uk / k . Simulated value: We can now determine the number of events happening in one time unit, by summing up the sk ’s using tk t k 1 sk . The number of events occurring before time 1 is our simulated value. Note the convention in use here. We shall use our first random number, u0 , to determine 0 , the exponential parameter that we shall use to simulate the time from time zero to the first event, s0 t0 . Then u1 will be used to determine 1 , the exponential parameter of the distribution of the time from the first to the second event, s1 , and now t1 s0 s1 is the total time to the second event. t2 will be the total time until the third event, and so on. Let’s see how this process works in practice using an example. Example 1.6 Simulate values from each of the three distributions given below using as many of the following random numbers as necessary: u0 0.14 u1 0.28 u2 0.73 u3 0.82 u4 0.44 (a) a Poisson distribution with mean 1.6. (b) a binomial distribution with parameters m 40 and q 0.04 . (c) a negative binomial distribution with parameters r 120 and 0.014 . 16 u5 0.61 Supplementary note Solution (a) Poisson distribution We have 1.6 . So, using our first random number, we have t0 log 1 0.14 /1.6 0.0943 . So the time to our first event is 0.0943 time units. We now use the same formula but with the next random number to find the time from the first to the second event: s1 log 1 u1 /1.6 0.2053 t1 0.0943 0.2053 0.2996 The total time to the second event is 0.2996 time units. Repeating the process again, we have: s2 log 1 u2 /1.6 0.8183 t2 0.2996 0.8183 1.1179 So the third event occurs after the end of the time period, and two events have occurred within the time interval 0,1 . The simulated value is 2. Note that, with the notation above, t k is actually the total time to the k 1 th event. (b) Binomial distribution We now need the values of c and d : d log 1 q log 0.96 0.04082 and: c md 1.63288 We can now calculate the appropriate values of k : 0 c 1.63288 1 c d 1.59206 2 c 2 d 1.55124 Now we can find the times of the various events: t0 s0 log 1 u0 /1.63288 0.0924 s1 log 1 u1 /1.59206 0.2063 t1 0.0924 0.2063 0.2987 s2 log 1 u2 /1.55124 0.8441 t2 0.2987 0.8441 1.1428 So again, the third simulated event occurs after the end of the time interval, and our simulated value is 2. (c) Negative binomial distribution Again we need the values of c and d : d log 1 0.01390 and: c rd 1.66835 We can now calculate the appropriate values of k : 0 c 1.66835 1 c d 1.68225 2 c 2 d 1.69615 17 Supplementary note Now we can find the times of the various events: t0 s0 log 1 u0 /1.66835 0.0904 s1 log 1 u1 /1.68225 0.1952 t1 0.0904 0.1953 0.2857 s2 log 1 u2 /1.69615 0.7719 t2 0.2857 0.7719 1.0576 So again, the third simulated event occurs after the end of the time interval, and our simulated value is again 2. Simulation from a decrement table When following the progress of a group of policyholders, it may be necessary to simulate the outcomes for the group. The group may be subject to a variety of different decrements, for example death, retirement, withdrawal and so on. Consider a group of 1,000 identical policyholders, all aged 60 exact. Let us assume that they are subject to three decrements, death, age retirement and ill-health retirement. The probabilities for each of these decrements at each age might be as follows: Age Probability of death Probability of age retirement Probability of illhealth retirement 60 0.04 0.12 0.09 61 0.05 0.15 0.10 We want to simulate the progress of this group of policyholders, identifying the numbers of lives who will leave the group via each decrement at each age. To do this, we will need to simulate values from various binomial distributions. We might proceed as follows. Consider first the number of deaths at age 60. This has a binomial distribution with parameters 1,000 and 0.04. So we first simulate a value from this binomial distribution to determine the number of deaths during the year. Suppose that our simulated value is 28. We now have a sample of 1,000 28 972 lives remaining. To determine the simulated number of age retirements during the year, we now need a value from a binomial distribution with parameter 972. However, we need the conditional probability of age retirement, given that a life 0.12 0.125 . We can simulate a value from the binomial is still alive. This will be 1 0.04 distribution with these parameters using any of the methods given previously for the binomial distribution. Suppose our simulated value is 102. We now have 972 102 870 lives remaining. To simulate the number of ill-health retirements, we need the conditional probability of taking ill health retirement, given that a life has not died or taken age retirement. This is: 0.09 0.10714 1 0.04 0.12 18 Supplementary note We now need a simulated value from a binomial distribution with parameters 870 and 0.10714. Perhaps our simulated value is 62. We now have 870 62 808 lives surviving in the population until age 61. We can continue the process for as long as necessary, simulating the observed numbers of lives exiting by each decrement at each age. We may need to carry out the process on a computer if we want a large number of repeated simulations. But the underlying method is fairly straightforward. 19 Supplementary note Supplementary Note Practice Questions Question 1.1 Use the random number u 0.845 to generate a random number from a negative binomial distribution with mean 0.264 and variance 0.3. Question 1.2 Using the inversion method, use the random number u1 0.42 to generate a single observation from a lognormal distribution with mean 5,000 and standard deviation 400. Question 1.3 Use the random numbers 0.81, 0.95, 0.09, 0.22 and the polar method to generate two random numbers from the standard normal distribution. Question 1.4 Use the random numbers u1 0.73 and u2 0.28 and the Box-Muller method to generate two random numbers from a normal distribution with mean 100 and standard deviation 10. Question 1.5 Use a stochastic process to generate a random observation from a binomial distribution with parameters m 50 and q 0.01 . Use as many of these random numbers as are needed: u0 0.423 u1 0.796 u2 0.522 u3 0.637 u4 0.992 Question 1.6 Use a stochastic process to generate values from a negative binomial distribution with parameters r 100 and 0.08 . Use the same random numbers as in the previous question. Question 1.7 Use the first two random numbers from the previous question to generate a random observation from the mixture of Pareto distributions with distribution function: 200 3 300 4 1 F( x ) 0.6 1 0.4 x 200 x 300 20 Supplementary note You are given the following information about a sample of lives: Life Date of birth Date of purchase Mode of exit Date of exit 1 Apr 15 1950 Jan 1 2011 Died May 15 2011 2 Jul 15 1950 Apr 1 2011 Surrendered Mar 15 2012 3 Oct 15 1950 Oct 1 2011 Alive - 4 Jan 15 1950 Feb 1 2011 Alive - 5 Feb 15 1951 Mar 1 2011 Died Aug 15 2011 These lives are subject to a 2-year investigation, running from July 1 2010 to June 30 2012. Assume in each of the following questions that each half-month period is exactly one twentyfourth of a year. Question 1.8 Using the exact exposure method, estimate h61 . Question 1.9 Using the actuarial exposure method, estimate q60 and h60 . Question 1.10 Using the exact exposure method with insuring ages last birthday, estimate q60 . Question 1.11 Using the actuarial exposure method with insuring ages last birthday, estimate q60 . Question 1.12 Explain how your answer to Question 1.10 would alter if you were using an anniversary-based study to estimate q60 and q61 . Question 1.13 Using an interval-based method and the table of lives given above, construct a table of decrements, and hence estimate q60 using the actuarial method. 21 Supplementary note Question 1.14 Find the estimated variance of your estimator in the previous question. 22 Supplementary note Solutions to Supplementary Note Practice Questions Question 1.1 We first need the parameters of the negative binomial distribution. Using the formulae for the mean and variance: r 0.264 and: r 1 0.3 Solving these simultaneous equations, we obtain r 1.936 and 0.13636 . The question does not require us to use a stochastic process, so it is probably quickest just to use the inversion method as normal. Calculating the first few negative binomial probabilities: p0 1 p1 r 0.780762 r 1 r 1 0.18139 So the inversion method will transform random numbers in the interval (0, p0 ) to a simulated value of zero, and random numbers in the range ( p0 , p0 p1 ) to a simulated value of 1. Our random number lies in this second interval, so our simulated value is 1. Question 1.2 First we need the parameters of the lognormal distribution. Using the formulae for the mean and variance of the lognormal, we have: 2 e ½ 5,000 and: 2 2 e 2 e 1 400 2 Solving these simultaneous equations, we find that 8.514003 and 2 0.0063796 . We now find a simulated N (0,1) value by using the normal tables: z1 0.42 z1 0.58 z1 0.2019 We can now find a simulated value from the lognormal distribution: x1 e z1 e8.514 0.2019 0.0063796 4,904 23 Supplementary note Question 1.3 First we find x1 2u1 1 0.62 and x2 2u2 1 0.90 . w x12 Applying the check, we find that x22 1.19 . Since w 1 , we reject these values and start the process again using the other random numbers. 2 2 Now we have x3 2u3 1 0.82 and x4 2u4 1 0.56 . Since 0.82 0.56 0.986 1 , we can proceed. So: y 2 log 0.986 /0.986 0.16911 and we have: z1 x3 y 0.82 0.16911 0.1387 and: z2 x 4 y 0.56 0.16911 0.0947 These are our simulated values from the standard normal distribution. Question 1.4 Using the standard Box-Muller formula, we have: z1 2 log u1 cos 2 u2 2 log 0.73 cos 2 0.28 0.148661 and: z2 2 log u1 sin 2 u2 2 log 0.73 sin 2 0.28 0.779308 These are independent N (0,1) observations. The corresponding values from the normal distribution with mean 100 and standard deviation 10 are: x1 100 10 z1 98.51 and: x2 100 10 z2 107.79 Question 1.5 We first need the parameters c and d : d log 1 q log 0.99 0.010050 and: c md 0.502517 We now use the formula k c dk to generate the successive exponential parameters: s0 log 1 u0 / 0 log 0.577 1.0943 0.502517 Since this value is greater than one, the first event occurs after time one, and so there are no observed events in a unit time period. The simulated value from the distribution is zero. 24 Supplementary note Question 1.6 First we need the values of the parameters c and d : d log 1 0.076961 and: c rd 7.696104 We use the formula k c dk to generate the successive exponential parameters. So the times to the events are: t0 log 1 0.423 /7.696104 0.071453 s1 log 1 0.796 /7.773065 0.204506 s2 log 1 0.522 /7.850026 0.094031 s3 log 1 0.637 /7.926987 0.127836 s4 log 1 0.992 /8.003948 0.603242 We see that t4 t0 s1 s2 s3 s4 is the first value of t which is greater than one. So the fifth event occurs after time 1, and there are 4 events in the unit time period. The simulated value is 4. Question 1.7 Since u0 0.423 and 0 0.423 0.6 , we use the first of the two Pareto distributions to simulate. Using our second random number: 3 200 0.796 1 x 139.75 x 200 25 Supplementary note Question 1.8 We start by calculating the age at which each life was first observed, and last observed. Treating 1 th of a year, we obtain the following table of ages and a half month as being equal to 24 exposures (the exposure unit is also one twenty-fourth of a year): Life Age at first observation Age at last observation e59 e60 e61 e62 1 60 17 24 2 61 24 - 7 2 - 2 60 17 24 61 16 24 - 7 16 - 3 23 60 24 61 17 24 - 1 17 - 4 1 61 24 11 62 24 - - 23 11 5 1 60 24 60 12 24 - 11 - - The total exposure at age 61 is 58 twenty-fourths of a year. We have one death at age 61 last 2 ). So our estimate is: birthday (Life 1 dies at age 61 24 hˆ61 1 0.41379 58 / 24 Question 1.9 Using the actuarial exposure method, we need to allow for extra exposure for the deaths. We are now looking at age 60, and Life 5 dies aged 60 12 . So there is extra exposure of 12 ths of a year 24 24 to age 61), and the total exposure at age 60 goes up by 12, from 26 to 38 (twenty(from age 60 12 24 fourths of a year). So we now have: qˆ 60 1 0.63158 38 / 24 To find the estimate for the hazard rate, we note that q 1 e h , ie that h log 1 q . So we have: hˆ60 log 1 0.63158 0.99853 26 Supplementary note Question 1.10 We now want to use insuring ages last birthday. We have the following new table of dates: Life Date of birth Date of purchase New date of birth Insuring age at entry Date of exit Insuring age at exit 1 Apr 15 1950 Jan 1 2011 Jan 1 1951 60 May 15 2011 9 60 24 2 Jul 15 1950 Apr 1 2011 Apr 1 1951 60 Mar 15 2012 23 60 24 3 Oct 15 1950 Oct 1 2011 Oct 1 1951 60 Jun 30 2012 60 18 24 4 Jan 15 1950 Feb 1 2011 Feb 1 1950 61 Jun 30 2012 62 10 24 5 Feb 15 1951 Mar 1 2011 Mar 1 1951 60 Aug 15 2011 11 60 24 9 , We now check the ages at death. Using insuring ages, we find that Life 1 now dies at age 60 24 11 . So we now have two deaths at age 60 last birthday. and Life 5 dies at age 60 24 The contributions to the exposures at each age are as follows (in units of one twenty-fourth of a year): Life Age at first observation Age at last observation e60 e61 e62 1 60 9 60 24 9 - - 2 60 23 60 24 23 - - 3 60 60 18 24 18 - - 4 61 62 10 24 - 24 10 5 60 11 60 24 11 - - So the exposure at age 60 is now 61 twenty-fourths of a year, and there are two deaths. So using the exact exposure method, the estimate of the hazard rate at age 60 is: hˆ60 2 0.78689 61 / 24 And so: qˆ 60 1 e 0.78689 0.54474 27 Supplementary note Question 1.11 Using the actuarial exposure method, we increase the exposure for lives 1 and 5 to a whole year. So we now have: e60 24 23 18 24 89 and the estimate for q60 is now: qˆ 60 2 0.53933 89 / 24 Question 1.12 The figures would be different for the two lives who remain until the end of the investigation. Life 3 enters the investigation on 1 Oct 2011. At this point there is less than a full year until the end of the investigation. So life 3 cannot contribute even one year to the exposure, and so would not contribute at all. Life 4 enters on 1 Feb 2011, so can contribute for a full year from 1 Feb 2011 to 1 Feb 2012. So the contributions to the exposure of these two lives will be zero for Life 3, and one full year of exposure in e61 only for Life 4. Question 1.13 We obtain the following figures: Age Pj nj dj wj Pj 1 60 0 4 1 0 3 61 3 1 1 2 1 62 1 0 0 1 0 We can now calculate the exposure at age 60 using the actuarial method: e60 P60 n60 w60 / 2 0 4 0 / 2 2 So the exposure is 2 years, and our estimate is qˆ 60 0.5 . Question 1.14 Using the formula given in the text, we have: 2 var qˆ 60 1 qˆ 60 d / e 2 0.52 1 / 2 2 0.0625 28 Supplementary note Appendix – Syllabus changes In 2013 the Society of Actuaries added a small number of syllabus items to the examination syllabus for Exam C. The new syllabus items are listed here, together with details of the material which covers them. A8 Identify and describe two extreme value distributions. A very brief introduction to the study of extreme value distributions is given in Section 1.2 of this study note. G Estimation of decrement probabilities from large samples 1 Estimate decrement probabilities using both parametric and non-parametric approaches for both individual and interval data 2 Approximate the variance of the estimators. Some methods for dealing with large samples are covered in Chapter 7 of the BPP textbook. However, some additional ideas are covered in Section 1.3 of this study note, which gives an alternative approach to these ideas. J2 Simulate from discrete mixtures, decrement tables, the ( a , b ,0) class, and the normal and lognormal distributions using methods designed for those distributions The basic simulation ideas are covered in Chapter 12 of the BPP textbook. A small number of additional methods, which have been added to the syllabus, are covered in Section 1.4 of this study note. The SoA has now added 10 additional questions to the end of the Exam C Sample Questions document (these are currently Questions 290-299). You can find this by searching on the web for “Be an actuary Exam C Syllabus” and clicking on the link to the syllabus – the questions and solutions links are at the end of the syllabus document. You should test your understanding of the material in this note by completing these additional questions. They all relate to the material in this study note. 29