Reserving with the Extended Link-Ratio Family of Models Glen Barnett and Ben Zehnwirth Abstract Brosius (1992) and Mack (1993) noted a connection between regression through the origin and link ratio methods. Mack gave standard errors for forecasts of the chain ladder model. Murphy (1994) developed standard errors and forecasts for link ratio models within that regression framework, and arguing that the model was inadequate, extended the model to include an intercept term. In this paper variety of tools are discussed which indicate when the assumptions inherent in link ratio techniques are satisfied. We describe and examine the Extended Link-Ratio Family of models (ELRF) in detail. We explore the circumstances under which the extended family of models give an appropriate framework for claims reserving, and see when this leads to a consideration of other modeling components, such as payment period trends. Illustrative examples demonstrate the importance of examining whether link-ratio type models are suitable before applying them. 1 Introduction The models underlying many of the standard actuarial techniques have been shown to be weighted regression models – Brosius noted a connection between them in 1992, and suggested an interpretation for an intercept term. Mack pointed out that the chain ladder was a weighted regression through the origin in 1993. Murphy (1994) derived standard errors of forecasts and forecast totals directly from the regression model. He also showed an intercept term could be important and extended the formulas for the model including an intercept. Recent work by the authors extended the model further. In this paper we examine in detail the assumptions these models carry with them, exploring a variety of tools which indicate when the assumptions inherent in link ratio techniques and their extensions are satisfied. We also explore diagnostics for assessing whether models are useful in terms of predictive ability, indicating the circumstances under which link ratio models and the extended link-ratio family of models (ELRF) give an appropriate framework for claims reserving, and see when this leads to a consideration of other modeling components, such as payment period trends. We examine a variety of illustrative examples as the assumptions are examined, and as predictive power is considered. These examples demonstrate the importance of examining whether link-ratio type models are suitable before applying them. 2 The Chain Ladder as a regression model Mack (1993) gives standard errors for forecasts in the chain ladder model. Murphy considers a more framework that covers a variety of link ratio (development factor) techniques, derives the standard errors using a regression framework, and mentions the need for an intercept. Brosius (1992) gives an interpretation of the intercept term. For a given development year (j, say), let yi be the cumulative loss up to that year for accident year i, and let xi be the previous cumulative. Further, let pi be the incremental (pi = yi xi). J–1 J 1 2 .: n y x Figure 1. Cumulative triangle, current year is j. The problem is to predict y. Then the individual link ratios are the values of yi/xi which is simply the slope of the line through the origin on the x,y plane that passes through (xi, yi), if we graph y against x. 12000 10000 8000 6000 4000 2000 0 0 1000 xi 3000 5000 Figure 2. Graph of y against x for a real data set. The ratio yi/xi is given by the slope for the marked point. How might we predict y? Ratio techniques attempt to predict y from x, and further, assume that on average, y is directly proportional to x: E(y|x) = b x (1.1) Hence the problem becomes one of how to estimate b, and there are a variety of ways to do so, which are all combinations (weighted averages) of the individual ratios yi/xi . The chain ladder development factor, the average development factor and the ordinary least squares line through the origin are all weighted averages of the individual ratios — indeed they are weighted regression lines with different weighting schemes. The regression model can be written as follows: or yi = bxi + i, (1.2) pi = (b) xi + i, (1.3) where Var[i] = 2xi. Note that b is the development factor. Here determines the weights: =1 gives the chain ladder, =2 gives the average development factor, and =0 is the ordinary regression line through the origin. This can be seen directly from the weighted least squares estimate for b̂ : b̂ = wi yi xi / wi xi2 = yi xi1– / xi2– (1.4) Note that (as is clear from equation 1.3) the same techniques yield a regression model for the incremental losses also (Venter, 1996). In fact, it is the ability to predict the incremental that is crucial – since the rest is already observed (y = p + x), and predicting x in terms of x is trivial. If we can predict the next incremental from the previous cumulative, then there may be some benefit in a ratio-based model. If the previous cumulative doesn’t have the power to predict the next incremental, there is no point in using x to predict y. Knowing this fact – that these techniques are really regressions – brings many advantages. It enables the calculation of uncertainties of ratios, and forecasts. It allows assessment of the appropriateness of the model under consideration. It provides checks on the usefulness of the model (a model may be appropriate in the sense that its assumptions are satisfied, yet still provide little ability to predict). Murphy suggests it is often necessary to introduce an intercept term: or yi = a + bxi + i, (1.5) pi = a + (b) xi + i, (1.6) with the same variances on the error term as before. It turns out for many data arrays that once you introduce the intercept term that the slope term is no longer required. The Extended Link Ratio Family Even after adjusting for exposures, loss data often has a trend down the accident years (due to inflation, for example). This will induce an apparent relationship between yi (or pi) and xi that is simply due to them both being related to the accident year trend. Consequently, we extend the framework of Murphy still further by introducing a term to model an accident year trend. Let w be the accident year for the ith loss value. Note that w is a simple function of i, depending only on how you label the years. or yi = a0 + a1 w + b xi + i, (1.7) pi = a0 + a1 w + (b) xi + i, (1.8) All of these models are weighted regression models for the cumulative, yi (or the incremental, pi) and with these, too, we obtain the advantages listed above. The models differ only in part of one assumption, so we discuss the assumptions for all of the models at the same time. 3 Assessing the appropriateness of the model We may assess whether or not the model may be used as a description of the data it is being fitted to by examining the appropriateness of the assumptions of the model for the data array in question. If the assumptions are not satisfied, the model is an inappropriate description of the data, and inferences based on the model will at best be suboptimal, and at worst, useless. Examination of the appropriateness of assumptions is an essential step of any modeling process, so it is important to know what they are in each case. 3.1 What are the assumptions of the weighted regression models? – The relationship is linear through the origin * (specification of the systematic component of the model) – variances about the line are proportional to xi (specification of the relative sizes of the random component) – values are normally distributed about the line (specification of the form of the random component) – values (conditionally on the predictors) are independent – there are no other important predictors of y, or if they are, they may be treated as occurring “randomly” with respect to the predictors in the model. * For the models that extend the link-ratio techniques to include an intercept, we drop “through the origin” from the first assumption above. 3.2 Examining the assumptions We may assess the suitability of the assumptions by examining a variety of diagnostic tools. Let us examine the assumptions one at a time. 3.2.1 Specification of the systematic component of the model The relationship is linear (through the origin). It is easy to examine whether the relationship between y and the predictor(s) is linear, and also whether an intercept may be required, by looking at a plot of y against the predictors. The assessment of linearity and whether or not an intercept is required may often be seen more easily if the current incrementals (pi) are graphed against the previous cumulative (xi). This is one of the arrays used by Mack. C um u la tiv e (1 ) v s . C um u la tiv e (0 ) 12000 Inc re m en ta l (1 ) v s . C um u la tiv e (0 ) 9000 10000 7000 8000 5000 6000 4000 3000 2000 1000 0 0 1000 3000 5000 -1000 1000 3000 5000 C o rr= -0 1 . 17 P -va u l e=0 7 . 64 Figure 3 Graph of cumulative and incremental versus previous cumulative for delay 1. The red line indicates the weighted (=1) least squares fit through the origin (i.e. chain ladder). The green line represents the fit with an intercept. The second plot gives a clearer indication of a nonzero intercept. There is no indication that the relationship would not be linear. An intercept may also be tested by adding it to the model and checking whether it is significantly different from zero. Linearity may be tested by specifying a form for the potential nonlinearity (such as a quadratic relationship) and testing the deviation from linearity. Examination of the plot of residuals from the linear model against the predictors can guide the selection of a nonlinear test function. Alternatively, quite general regression relationships can be estimated and compared to linear relationships. If the relationship is not linear or an intercept is required but not in the model, then the predictions will be biased, and the calculated uncertainties will be too small in the early and late development years. Since intercepts are positive in general, the bias is generally negative for the smaller values (underprediction) and positive for the larger values (overprediction). The consequences may be serious. W td .S td .R e s idu a ls v s .F ittedV a lu e s 2 1 0 -1 0 5000 10000 15000 20000 25000 Figure 4. Missing intercept: the “fit” lies below the data for low fitted values and above the data for high fitted values. 3.2.2 Specification of the relative sizes of the random component The techniques collectively assume that the variances about the line are proportional to xi. If this is not true for some value of close to 0, 1 or 2, the techniques will result in suboptimal, possibly poor estimates. Additionally, the estimates of standard errors of parameters and forecasts will be incorrect. One diagnostic for assessing whether the variance specification is correct is to plot the log of the squared residuals against the log of the x’s for a model that accounts for the mean well, with =0. This follows from the variance specification: Var(ei) = E(ei2) = 2 xi So, ei2 = 2 xi i for some multiplicative error term, , or ln(ei2) = ln(2) + ln(xi) + i where i = ln(i). The slope of the relationship should be a reasonable estimate of . We can also use a regression to get an approximation to the slope (note that we won’t have normality – the distribution is mildly left skewed if the original data is normally distributed). It is even possible to combine information across several regressions, if allowance is made for the different spreads (either by adjustment – using standardized residuals, or within the regression). The non-normality is not crucial in this case: since we are just estimating the relative variances – a second-order effect – and the skewness is only mild, the loss of efficiency in the estimation of the variance should be of minor consequence. Note that if a non-zero value of was used, the plot will indicate a change in . 10 5 0 4 -5 -10 5 6 7 8 9 10 11 DY1 DY2 DY3 DY4 DY5 DY6 -15 -20 Figure 5. Plot of log(squared standardized residual) vs log(previous cumulative), with =1. The slope indicates the required change in delta – a change of approximately –3.5, to –2.5, indicating that that the assumption of =1 is not reasonable. Alternatively, some measure of fit for various values of may be calculated and compared – something such as the Akaike Information Criterion (AIC), or the Bayes Information Criterion (BIC). Since the number of parameters will be the same for the models being compared, differences in AIC, differences in BIC and differences in twice the log-likelihood will all be the same. However, we must also see that the variance does not obviously change with anything else. This is related to the “no other predictors” assumption below (3.2.5). Examination of residual plots against other potential predictors, especially the development year, is important. W td .S td .R e s idu a ls v s .D ev .Y ea r 3 2 1 0 -1 -2 -3 0 1 2 3 4 5 6 7 8 9 10 11 12 Figure 6. Residuals against development year showing change in spread. 3.2.3 Specification of the form of the random component Values are normally distributed about the line. The best estimates of the regression parameters are then the least squares estimates - this is the distribution for which the regression estimates are optimal. Normality may be assessed by a variety of approaches. For example, normality may be assessed graphically by the use of a normal scores plot of standardized residuals, or less formally by examining a histogram of standardized residuals. The normal scores plot consists of a plot of the residuals against their normal scores. The normal scores are (approximations to) the expected values of the standardized residuals taking into account their ordering. For example, the expected value of the median residual is zero, the expected value of the standardized residual at the upper quartile will be approximately the upper quartile of a standard normal, and so on. Actually, it is a little more complicated than that, but it is close enough for most quantiles. If the data are from a normal distribution, the plot should appear to be a straight line. If there is a substantial deviation from normality, the plot will appear curved in some fashion. Wtd. Std. Residuals vs Normal Scores 2 1 0 -1 -2.00 -1.00 0.00 1.00 2.00 N = 37, P is less than 0.01, R^2 = 0.8965 Figure 6. Plot of standardized residuals vs normal scores. There is a clear indication that the distribution is right-skewed. Additionally, it is possible to formally test the hypothesis of normality – there is a test based on the correlation in the normal scores plot, which is related to the Shapiro-Wilk test, or there are any number of other goodness-of-fit tests with good power properties, such as the Anderson-Darling test (modified for the fact that the mean and variance must be estimated). Note that the ordinary chisquared goodness-of-fit test is not recommended for this purpose, due to its poor power properties. 3.2.4 Independence Values (conditionally on the predictors) are independent. This is the same thing as independence of the error terms. Checking for a violation of this requires us to consider what specific manner of dependence may occur. This is made more complex because of the existence of two independent ‘time’ directions. For example, there may be correlation across consecutive accident years within each development year, but there could also be correlation between consecutive values in the same accident year (across regressions for each development year), or across payment years. Indeed there may be any number of other possible relationships. The three possible relationships mentioned can be examined using plots of standardized residuals against the values it is anticipated they may be correlated with, or even tested by calculation of correlations between the values. AY std resids vs previous AY 1.5 1 0.5 2 0 -1.5 -1 -0.5 0 -0.5 0.5 1 1.5 -1 3 vs 4 vs 1 vs 2 3 -1.5 Last PY std residuals vs 2nd last PY DY 1 residuals vs DY 0 1.2 4 0.8 3 0.4 -2 0 -1 0 -0.4 2 1 2 3 4 5 6 7 1 0 0 1 2 Figure 7. Plots of standardized residuals vs residuals from previous accident, development or payment year. There is some indication of a relationship in each case. 3.2.5 Model ‘completeness’ There are no other important predictors of y, or if they are, they may be treated as occurring “randomly” with respect to the predictors in the model. This does not mean every potential predictor must be part of every model. However, obvious possible predictors should be examined to see if they can account for the data, particularly if they may not be treated as random with respect to the predictors in the model. Certainly potential predictors that are related to predictors already in the model cannot be treated as random. For example, obvious potential predictors are the development years, accident years and payment years themselves. These will likely be related to the predictors in the model, and so plots of standardized residuals against them should be examined There are several likely relationships. These include increasing trends down accident years, which in turn ‘induce’ an apparent ratio effect (y-x is related to x). This apparent relationship disappears if the accident year trend is removed. In c re m en ta l (1 ) v s .A cc id en tY ea r In c re m en ta l (1 ) v s . C um u la tiv e (0 ) 250000 250000 225000 225000 200000 200000 175000 175000 150000 150000 125000 125000 100000 100000 75000 75000 1977 1979 1981 1983 Co rr=0 9 . 85 P -vau l e=0 0 . 00 1985 60000 80000 T rend A d ju s ted In c r(1 ) v s .C um u la tiv e (0 ) 160000 180000 S lop eA d ju s ted In c r . (1 ) v s .A cc id en t 15000 15000 10000 10000 5000 5000 0 0 -5000 -5000 -10000 -10000 -15000 -20000 60000 100000 120000 140000 Co rr=0 9 . 85 P -vau l e=0 0 . 00 -15000 80000 100000 120000 140000 Co rr=0 1 . 10 P -vau l e=0 7 . 62 160000 180000 1977 1979 1981 1983 Co rr=0 1 . 17 P -vau l e=0 7 . 47 1985 Figure 8. Increasing trend in the accident year direction (upper left plot) induces a correlation between the incremental and the previous cumulative (upper right). When the accident year trend is removed, the induced correlation is also removed. Another likely relationship is changing payment year trends. We can examine these by plotting the standardized residuals against payment years. Note that changing payment year trends occur in the incrementals, since it is on the incrementals that superimposed inflation acts. It is not generally possible to see the changing payment year trends in the original cumulative data. However, since the residuals from any of the regression models mentioned are also models for the incrementals, the residuals from the regressions can be useful for detecting changes in payment year trends. W td .S td .R e s idu a ls v s .P ay .Y ea r 2 15 . 1 05 . 0 -0 5 . -1 78 79 80 81 82 83 84 85 86 87 Figure 9. Changing payment year trends in residuals from a chain ladder fit. 4 Selecting useful models It is important that the data to which a model is fitted satisfy the assumptions of the model, but it is not sufficient. The model must also have predictive power to be useful. That is, the model must have power to predict the next incremental. If a ratio model can’t predict the next incremental better than the average observed incremental in the same development year, then there is little point in using the ratio model in preference to it. 4.1 When is a ratio model useful? One way to assess predictive power is to plot the incremental against the previous cumulative (as in figure 3), or, in case of the presence of accident year trend, the incremental adjusted for accident year trend against the previous cumulative (as in figure 8). If there is a relationship in the plot, then the previous cumulative has predictive power. We can also assess the predictive power of the ratio model by fitting the regression model with intercept – and accident year trend (if necessary). If, after eliminating the least significant parameters, the ratio parameter (b) is significantly different from 1 (i.e. b – 1 is different from 0), then the previous cumulative has predictive power. When the model assumptions are satisfied, there may remain a choice between several models. If a parameter has no predictive power, it should be eliminated from the model – a more parsimonious model will have shorter prediction intervals (less uncertainty). If a potential parameter does have predictive power, it should be included in the model (less bias). 4.2 Simulation It is a simple matter to simulate data from a given regression model, including one we have just estimated. If a model is appropriate and useful then simulated data from that model should not be readily distinguishable from real data. If there were some distinguishing feature, it would imply some deficiency in the model that was previously missed. Summary It is common for ratio parameters to be eliminated when there are intercepts and accident year trends in the model. Ratio parameters don’t relate different accident years – they don’t ‘borrow strength’ across accident years when predicting the current value in a given accident year. With the diagnostics discussed in this paper, we can see when models from the extended link ratio family (ELRF) are suitable. Often none of them are – for example, none of these models can deal with changing payment year trends. If the diagnostics indicate changing payment year trends, or non-normal residuals, or dependence between development years, etc, then a different kind of model must be considered. References Brosius, E. (1992), Loss Development using Credibility. Casualty Actuarial Society Part 7 Exam Study Kit Mack, T. (1993). Distribution-Free Calculation of the Standard Error of Chain Ladder Reserve Estimates. ASTIN Bulletin, Vol 23 No. 2. 213-225. Mack, T. (1994). Which stochastic model is underlying the chain ladder method? Insurance Mathematics and Economics, Vol 15 No. 2/3. 133-138. Murphy, D. M. (1994). Unbiased Loss Development Factors. Proceedings of the Casualty Actuarial Society, Vol LXXXI No. 144-155, 154-222. Shapiro S. S. and Francia R. S. (1972). Approximate analysis of variance test for normality. Journal of the American Statistical Association, 67, 215-216. Venter, G. (1996). Testing assumptions of age-to-age Factors. Draft Report, Instrat/Sedgwick Re, New York.