Geographical Dependence of Insurance Risk LÁSZLÓ MÁRKUS1 AND MIKLÓS ARATÓ1 1 Dept. Probability Theory and Statistics, Eötvös Loránd University Pázmány P. st. 1/C., H-1117, Budapest, Hungary E-mail: markus@cs.elte.hu ABSTRACT: The spatial dependence structure for the number of claims occurring in motor third party liability insurance in Hungary is analysed. A Bayesian hierarchical model is built up for the non-homogeneous spatial Poisson process of claim numbers. The estimation of the model is carried out by MCMC technique, applying block updates to obtain better mixing in the high dimensions. KEY WORDS: Bayesian hierarchical models; Claim frequency, Disease-mapping; Geopricing; Motor insurance; Markov Chain Monte Carlo (MCMC); Non-homogeneous Poisson process; Postcodes; Premium rating; Risk premium; Spatial statistics 1. INTRODUCTION It is a quite common practice in various insurances (household insurance, automobile insurance) to let the risk premium per unit exposure vary with geographic area when all other risk factors are held constant. In the motor insurances sector, for instance, most companies have adopted a risk classification according to the geographical zone where the policyholder lives. However, companies apply very different principles leading to spatial dependence in their premium rating, but more often than not these principles reflect common sense approaches (e.g. categorising by population) or for a better word expertise, rather than exact risk estimations. In view of customer sensitivity to “unjustly” increased premiums it will be desirable to estimate the spatial variation in risk and to price accordingly. Nevertheless, judging by the few available publications and references, the problem attained relatively little attention in the past. One of the early works is due to C. G. Taylor, [Taylor(1989)] who used two-dimensional splines on a plane linked to the map of the region by a transformation, in order to assess the spatial variability in Australian Household Contents Insurance. In the paper of Boskov and Verrall (1994) a method for premium rating by postcode area is elaborated. A Bayesian spatial model is suggested there for the claims frequency, closely related to the ideas of Besag, York and Molliè (1991) who allow for both spatially structured and unstructured heterogeneity in one model. One set of error components are independent and identically distributed reflecting an unstructured pattern of heterogeneity, while the other set of error components exhibit spatial correlation among neighboring areas but uncorrelation among the other areas. This construction is usually referred to as “structured heterogeneity”. The model is substantially refined in Brouhns, Denuit, Masuy & Verrall (2002), where an unknown part of the variation of the spatial parameters may be caused by geographically varying unobserved factors. The so-called disease mapping methods have been invoked to give more reliable estimates of the geographical variation of claim rates. The general goal is to identify extra sample variation due to unobserved heterogeneity by filtering the Poisson sample variation. The model identification mixes a frequentist approach to estimate the effect of all risk factors other than location with a Bayesian approach to evaluate the risk of each district. In order to avoid the preprocessing of the data to remove the effect of all non-spatial risk factors Dimakos and Frigessi di Rattalma (2002) propose a fully Bayesian approach to non-life premium rating, based on hierarchical models with latent variables for both claim frequency and claim size. Inference is based on the joint posterior distribution and is performed by Markov Chain Monte Carlo. Rather than plug in point estimates of all unknown parameters, they take into account all sources of uncertainty simultaneously when the model is used to predict claims and estimate risk premiums. Several models are fitted to both a simulated dataset and a small portfolio regarding theft from cars. They show that interaction among latent variables can improve predictions significantly. Denuit and Lang (2004) incorporate spatial effects into a semiparametric additive model as a component of the nonlinear additive terms with an appropriate prior reflecting neighborhood relationships. The functional parameters are approximated by penalised-splines. This leads to a unified treatment of continuous covariates and spatially correlated random effects. The spatial estimation of risk is very much analogous to the problem of disease mapping in spatial epidemiology. The risk-categorisation of the localities the insurance companies practice is equivalent to the colouring of a map, and statistical techniques for that are well elaborated, see e.g. Green and Richardson (2002) and references therein. 2. DATA AND MODEL DESCRIPTION We have the data of a certain insurer in Hungary at our disposal, of which we consider the number of claims in motor third party liability insurances for the period from 01-01-2000 to 31-12-2003. The data are slightly biased and rescaled according to the request of the company but this does not affect the methods used and the character of the results. The number of claims yi together with exposure times ti are given in 3111 postcode areas of the country hereinafter referred to as localities. Localities are uniquely identified by their “centre”, the (x,y) coordinates of which are also given. It should be noted that other informations are also available, many of which are further risk factors such as car type, age of the policyholder, population of the locality, etc., but in the present paper we do not intend to investigate these jointly. Our goal is to estimate the number of claims per unit exposure for all the 3111 localities. Direct estimation is obviously not viable, except perhaps for the capital Budapest. It is especially so in localities with no policy, or in localities with just a few contracts, where the occurrence of one claim increases dramatically the direct risk estimation. So, the estimation must rely heavily on the spatial structure of the data meaning the information available in the neighbouring locations. We suppose, that the counts of claims follow a conditionally independent Poisson model at the lowest level of the hierarchy. It assumes a non-homogeneous spatial Poisson process with intensities dependent on exposure times ti and locations. It is convenient to single out a joint scale parameter , and write the spatial parameters (the relative risks) in exponential form as e i by keeping the mean of the exponents at zero. Later on this scale parameter can incorporate the effect of the additional risk factors, but here we do not elaborate further on this. So, on the first hierarchy level the number of claims at the i-th location represent conditionally independent Poisson variables Yi with parameter e i ti , given the scale, exposure time, and the spatial parameter. 47.5 47.0 46.0 46.5 prmaps[, 2] 48.0 48.5 The probability map as described in Cressie (1993), (6.2.1.) may serve as the tool to illustrate the spatial inhomogeneity of this Poisson process. The essence of this method is that under the assumption of all-equal (to 0) spatial parameters and having ti known, can be estimated, the expected number of claims computed and compared to the empirical value. When expected<observed, compute the probability of sample exceedence P(Yiyi), whereas when expected>observed compute the probability of sample domination P(Yi yi), and draw a map of these probabilities. We give the probability map on Fig. 1. The size of dots representing localities indicate probabilities on the 10-m scale m=1,2,3,4,5, but the two largest dot sizes correspond to probabilities between 10-5 and 10-9, and less than 10-9. 16 17 18 19 20 21 22 23 prmaps[, 1] Figure 1. The probability map of expected claims counts, when intensities depend on exposure only At the next level of model hierarchy we introduce a Gaussian random field model for the log relative risks, that is the exponents i of the location parameters of the Poisson rates. That means the vector Θ i having a zero mean multidimensional normal distribution with spatially structured covariance matrix. The dimension equals the number of localities that is 3111. The covariance between claims counts at various locations depends on the neighbouring relationship. Being aware of the multitude of possibilities and the various shortcomings of each, we choose localities to be neighbouring when the distance of their centers lies within 30 km distance. Heuristically speaking it is assumed that the covariance of claim occurrences can be decomposed into exponentially decreasing amounts as the degree of neighbouring relationship decreases. That means only a part of the correlation is due to the immediate neighbourship relation a smaller part originates from neighbours of the neighbour, and then an even smaller part from their neighbours, and so on. These parts decrease exponentially, and summed up give the actual correlation between the ith and jth location. The base of this exponential decrease in correlation is a crucial parameter of the third level of hierarchy to be estimated. In terms of the neighbourship matrix A, the i-th and j-th elements of which are ones or zeros according to the ith and j-th localities being neighbours or not, the higher degree neighbourship is represented by the appropriate power of that matrix. Letting the degree of neighbourship tend to infinity in expressing the correlation one arrives at the power series of the neighbourship matrix times the exponent . When added up, this results in the correlation structure for the log relative risks, the spatial parameter vector Θ as (I-A)-1, with I denoting the unit matrix as usual. In order to keep the correlation matrix positive definite, we assume that < max , max being reciprocal to the maximal eigenvalue of A. So, we have Θ ~N (0,(I-A)-1). In line with the Bayesian setup we put a prior on and in order to keep it below max suppose /max distributed according to a beta(p,q) law. 3. THE LIKELIHOOD AND THE ESTIMATION We are now in the position to write down the conditional likelihood f Θ, y as N N 1/ 2 T f , y exp y exp ei t i yi I A i 1 i 1 1 T T exp A p 1 1 2 max q 1 From here we have the form for the log-posterior as N 1 N N T log f , y y ei t i yi log log 1 i ( A) 2 i 1 i 1 i 1 , 1 T T A ( p 1) log (q 1) log 1 2 max where i(A) denotes the ith eigenvalue of A. For a given and it is straightforward to N yi ˆ maximise the log-posterior in . The maximum is attained at Ni 1 providing for the ei ti i 1 conditional maximum likelihood estimator of , given the rest of the parameters. For the estimation of and Markov Chain Monte Carlo simulation is used, with obvious need for Metropolis-Hastings steps. The crude estimations of , pushed a little away from 0 when necessary, and max/2 are used as initial values. After a new proposal is obtained for the conditional max-likelihood estimator ̂ of is plugged in to compute the log-posterior update. 1 N The term log 1 i ( A) can be computed quickly, once the eigenvalues of the A matrix 2 i 1 are stored. It is of vital importance here that we do not have to update eigenvalues, because it is computationally very demanding and it would make the algorithm hopeless to reach convergence. The challenge in the implementation of the MCMC algorithm is the very high dimension. We have to reach convergence and mixing in a 3112 dimensional state space. All the papers mentioned in the introduction addressed a significantly lower dimensional problems and their publicly available programs did not seem to work or did not produce result within acceptable computer runtime (up to several hours or a night on a 2.6 GHz PC). Statistical inference for Gaussian random fields routinely done by MCMC simulation methods, but mostly by single-site updating, that is updating each parameter one by one in turn. However, it is well known that such a single-site updating can have very poor convergence and mixing properties. Several authors have therefore suggested to block update the parameters (for details on block updates see e.g. Knorr-Held and Rue (2002 and references therein). Though block updating has relatively rarely been considered so far in spatial models, we apply it here for . In the acceptance-rejection step a computationally very demanding operation is to calculate the quadratic form in the logposterior, as A is a 31113111 matrix. As it consists of zeros and ones only, its role is nothing but to say for which pair of indices should i j appear in the sum expressing the quadratic form as A T i , j: Ai , j 1 i j . Once the pairs of indices are stored the computation can be substantially fastened – we experienced 4 times higher speed in computing. A sparse matrix decomposition may provide an alternative way for fast computation. We simulated 200000 times from the chains, and obtained acceptance rates 27% for and 34.5% for the vectors. The trace plots of and 1 together with the ACF for the conditional max-likelihood for and the log-posterior are given for every 20th iteration (except for the ACF, that is computed for all iterations) in Fig.2. ACF of rho 0.0 5 e-04 0.4 0.8 8 e-04 Simulated chain for rho 2000 4000 6000 8000 10000 0 10 20 30 40 all values after burn in Simulated chain for theta[1] Estimated values of lambda 50 30000 every 20-th value 0 2000 4000 6000 every 20-th value 8000 10000 28000 26000 -3 2.5 3.5 -2 -1 0 4.5 1 5.5 0 0 2000 4000 6000 every 20-th value 8000 10000 0 2000 4000 6000 8000 10000 Log-posterior Figure 2. Trace plots of parameters, ACF of and the log-posterior. The relatively poor mixing can be noted for 1 and the situation is not better for the other coordinates of , but that is an effect of the high dimension. The 20th simulated values are practically uncorrelated. Although it seems as if 50000 iterations would be enough for burn in we regaded the first 100000 as such, and computed mean estimates for the parameters from the second half of the simulated data. This gives the estimation of the intensities in the Poisson model, by the help of which the expected number of claims can be computed. As Fig. 3. shows, the difference between the actually observed and the expected from the model is surprisingly low, it does not exceed 3 in absolute values, and it is a bit positively skewed. This difference, the model residuals, give the way to produce again a probability map, displayed in Fig.3, in order to check for spatial irregularities in the residuals. 48.5 3000 0 500 1000 1500 2000 2500 3000 -2 -2 -1 -1 0 46.0 0 1 1 2 46.5 2 3 Observed claims at localities 47.5 2500 3000 47.0 1000 1500 2000 Expected claims at localities prmaps[, 2] 500 48.0 0 1000 3000 0 1000 0 0 500 1000 1500 2000 2500 3000 The spatial parameters theta 0 500 1000 1500 2000 2500 3000 Residuals=observed-expected 16 17 18 19 20 21 22 23 prmaps[, 1] Figure 3. Observed and expected (by the means of simulated parameters) claims numbers, the mean log relative risk, the residuals and the resulted probability map The obtained map is almost completely homogeneous, at least it is lacking any spatial structure, ensuring that the spatial parameters we found describe the spatial dependence of the risk. This relative risk is therefore a suitable indicator for the basis of geopricing in insurance. ACKNOWLEDGEMENT: The authors are indebted to Charles C. Taylor from Leeds, U.K. for his suggestions concerning the model and the valuable discussions on the topic. This research was partially supported by the Hungarian National Research Fund OTKA, grant No.: T047086. REFERENCES Besag, J., York, J., Molliè, A., 1991: Bayesian image restoration with two applications in spatial statistics (with discussion). Ann. Inst. Stat. Math. 43, 1–59. Boskov, M., Verrall, R.J., 1994: Premium rating by geographic area using spatial models. ASTIN Bulletin. 24, 131–143. Brouhns, N., Denuit, M., Masuy, B., Verrall, R., 2002: Ratemaking by geographical area: A case study using the Boskov and Verrall model, Discussion paper 0202, Publications of the Institut de statistique, Louvain-la-Neuve, 1-26. Denuit, M,, Stefan Lang S. 2004: Non-life rate-making with Bayesian GAMs, Insurance: Mathematics and Economics Vol. 35, 627–647. Dimakos, X.K., Frigessi di Rattalma, A., 2002: Bayesian premium rating with latent structure. Scandinavian Actuarial Journal 162–184. Green, P. J., Richardson, S., 2002: Hidden Markov Models and Disease Mapping, Journal of the American Statistical Association Vol. 97, No. 460, 1055-1070. Knorr-Held, L., Rue, H., 2002: On block updating in Markov random field models for disease mapping, Scandinavian Journal of Statistics, Vol. 29. 597-614.