It is common in insurance of domestic property lines, such as

advertisement
Geographical Dependence of Insurance Risk
LÁSZLÓ MÁRKUS1 AND MIKLÓS ARATÓ1
1
Dept. Probability Theory and Statistics, Eötvös Loránd University
Pázmány P. st. 1/C., H-1117, Budapest, Hungary
E-mail: markus@cs.elte.hu
ABSTRACT: The spatial dependence structure for the number of claims occurring in motor
third party liability insurance in Hungary is analysed. A Bayesian hierarchical model is built
up for the non-homogeneous spatial Poisson process of claim numbers. The estimation of the
model is carried out by MCMC technique, applying block updates to obtain better mixing in
the high dimensions.
KEY WORDS: Bayesian hierarchical models; Claim frequency, Disease-mapping; Geopricing;
Motor insurance; Markov Chain Monte Carlo (MCMC); Non-homogeneous Poisson process;
Postcodes; Premium rating; Risk premium; Spatial statistics
1. INTRODUCTION
It is a quite common practice in various insurances (household insurance, automobile
insurance) to let the risk premium per unit exposure vary with geographic area when all other
risk factors are held constant. In the motor insurances sector, for instance, most companies have
adopted a risk classification according to the geographical zone where the policyholder lives.
However, companies apply very different principles leading to spatial dependence in their
premium rating, but more often than not these principles reflect common sense approaches (e.g.
categorising by population) or for a better word expertise, rather than exact risk estimations. In
view of customer sensitivity to “unjustly” increased premiums it will be desirable to estimate the
spatial variation in risk and to price accordingly. Nevertheless, judging by the few available
publications and references, the problem attained relatively little attention in the past.
One of the early works is due to C. G. Taylor, [Taylor(1989)] who used two-dimensional
splines on a plane linked to the map of the region by a transformation, in order to assess the
spatial variability in Australian Household Contents Insurance. In the paper of Boskov and
Verrall (1994) a method for premium rating by postcode area is elaborated. A Bayesian spatial
model is suggested there for the claims frequency, closely related to the ideas of Besag, York and
Molliè (1991) who allow for both spatially structured and unstructured heterogeneity in one
model. One set of error components are independent and identically distributed reflecting an
unstructured pattern of heterogeneity, while the other set of error components exhibit spatial
correlation among neighboring areas but uncorrelation among the other areas. This construction
is usually referred to as “structured heterogeneity”. The model is substantially refined in Brouhns,
Denuit, Masuy & Verrall (2002), where an unknown part of the variation of the spatial
parameters may be caused by geographically varying unobserved factors. The so-called disease
mapping methods have been invoked to give more reliable estimates of the geographical
variation of claim rates. The general goal is to identify extra sample variation due to unobserved
heterogeneity by filtering the Poisson sample variation. The model identification mixes a
frequentist approach to estimate the effect of all risk factors other than location with a Bayesian
approach to evaluate the risk of each district.
In order to avoid the preprocessing of the data to remove the effect of all non-spatial risk
factors Dimakos and Frigessi di Rattalma (2002) propose a fully Bayesian approach to non-life
premium rating, based on hierarchical models with latent variables for both claim frequency and
claim size. Inference is based on the joint posterior distribution and is performed by Markov
Chain Monte Carlo. Rather than plug in point estimates of all unknown parameters, they take
into account all sources of uncertainty simultaneously when the model is used to predict claims
and estimate risk premiums. Several models are fitted to both a simulated dataset and a small
portfolio regarding theft from cars. They show that interaction among latent variables can
improve predictions significantly.
Denuit and Lang (2004) incorporate spatial effects into a semiparametric additive model
as a component of the nonlinear additive terms with an appropriate prior reflecting neighborhood
relationships. The functional parameters are approximated by penalised-splines. This leads to a
unified treatment of continuous covariates and spatially correlated random effects.
The spatial estimation of risk is very much analogous to the problem of disease mapping
in spatial epidemiology. The risk-categorisation of the localities the insurance companies
practice is equivalent to the colouring of a map, and statistical techniques for that are well
elaborated, see e.g. Green and Richardson (2002) and references therein.
2. DATA AND MODEL DESCRIPTION
We have the data of a certain insurer in Hungary at our disposal, of which we consider
the number of claims in motor third party liability insurances for the period from 01-01-2000 to
31-12-2003. The data are slightly biased and rescaled according to the request of the company
but this does not affect the methods used and the character of the results. The number of claims yi
together with exposure times ti are given in 3111 postcode areas of the country hereinafter
referred to as localities. Localities are uniquely identified by their “centre”, the (x,y) coordinates
of which are also given. It should be noted that other informations are also available, many of
which are further risk factors such as car type, age of the policyholder, population of the locality,
etc., but in the present paper we do not intend to investigate these jointly.
Our goal is to estimate the number of claims per unit exposure for all the 3111 localities.
Direct estimation is obviously not viable, except perhaps for the capital Budapest. It is especially
so in localities with no policy, or in localities with just a few contracts, where the occurrence of
one claim increases dramatically the direct risk estimation. So, the estimation must rely heavily
on the spatial structure of the data meaning the information available in the neighbouring
locations.
We suppose, that the counts of claims follow a conditionally independent Poisson model
at the lowest level of the hierarchy. It assumes a non-homogeneous spatial Poisson process with
intensities dependent on exposure times ti and locations. It is convenient to single out a joint

scale parameter , and write the spatial parameters (the relative risks) in exponential form as e i
by keeping the mean of the exponents at zero. Later on this scale parameter can incorporate the
effect of the additional risk factors, but here we do not elaborate further on this. So, on the first
hierarchy level the number of claims at the i-th location represent conditionally independent

Poisson variables Yi with parameter   e i  ti , given the scale, exposure time, and the spatial
parameter.
47.5
47.0
46.0
46.5
prmaps[, 2]
48.0
48.5
The probability map as described in Cressie (1993), (6.2.1.) may serve as the tool to
illustrate the spatial inhomogeneity of this Poisson process. The essence of this method is that
under the assumption of all-equal (to 0) spatial parameters and having ti known,  can be
estimated, the expected number of claims computed and compared to the empirical value. When
expected<observed, compute the probability of sample exceedence P(Yiyi), whereas when
expected>observed compute the probability of sample domination P(Yi yi), and draw a map of
these probabilities. We give the probability map on Fig. 1. The size of dots representing localities
indicate probabilities on the 10-m scale m=1,2,3,4,5, but the two largest dot sizes correspond to
probabilities between 10-5 and 10-9, and less than 10-9.
16
17
18
19
20
21
22
23
prmaps[, 1]
Figure 1. The probability map of expected claims counts, when intensities depend on exposure only
At the next level of model hierarchy we introduce a Gaussian random field model for the
log relative risks, that is the exponents i of the location parameters of the Poisson rates. That
means the vector Θ  i  having a zero mean multidimensional normal distribution with
spatially structured covariance matrix. The dimension equals the number of localities that is 3111.
The covariance between claims counts at various locations depends on the neighbouring
relationship. Being aware of the multitude of possibilities and the various shortcomings of each,
we choose localities to be neighbouring when the distance of their centers lies within 30 km
distance. Heuristically speaking it is assumed that the covariance of claim occurrences can be
decomposed into exponentially decreasing amounts as the degree of neighbouring relationship
decreases. That means only a part of the correlation is due to the immediate neighbourship
relation a smaller part originates from neighbours of the neighbour, and then an even smaller part
from their neighbours, and so on. These parts decrease exponentially, and summed up give the
actual correlation between the ith and jth location. The base  of this exponential decrease in
correlation is a crucial parameter of the third level of hierarchy to be estimated. In terms of the
neighbourship matrix A, the i-th and j-th elements of which are ones or zeros according to the ith and j-th localities being neighbours or not, the higher degree neighbourship is represented by
the appropriate power of that matrix. Letting the degree of neighbourship tend to infinity in
expressing the correlation one arrives at the power series of the neighbourship matrix times the
exponent . When added up, this results in the correlation structure for the log relative risks, the
spatial parameter vector Θ as (I-A)-1, with I denoting the unit matrix as usual. In order to keep
the correlation matrix positive definite, we assume that  < max , max being reciprocal to the
maximal eigenvalue of A. So, we have Θ ~N (0,(I-A)-1).
In line with the Bayesian setup we put a prior on  and in order to keep it below max
suppose /max distributed according to a beta(p,q) law.
3. THE LIKELIHOOD AND THE ESTIMATION
We are now in the position to write down the conditional likelihood f Θ,  y as






 N
 N
1/ 2
T
f ,  y  exp y   exp    ei t i      yi  I  A 
 i 1
 i 1




 1 T

T
 exp        A    p 1 1 
 2

  max



q 1
From here we have the form for the log-posterior as


N
1 N
 N 
T
log f ,  y  y    ei t i     yi  log    log 1  i ( A)  
2 i 1
i 1
 i 1 
,

1 T
 
T

      A  ( p  1) log   (q  1) log 1 
2
  max 
where i(A) denotes the ith eigenvalue of A. For a given  and  it is straightforward to
 N 
  yi 
ˆ
maximise the log-posterior in . The maximum is attained at    Ni 1  providing for the
 ei ti


i 1
conditional maximum likelihood estimator of , given the rest of the parameters. For the
estimation of  and  Markov Chain Monte Carlo simulation is used, with obvious need for
Metropolis-Hastings steps. The crude estimations of , pushed a little away from 0 when
necessary, and max/2 are used as initial values. After a new proposal is obtained for  the
conditional max-likelihood estimator ̂ of  is plugged in to compute the log-posterior update.
1 N
The term  log 1  i ( A)  can be computed quickly, once the eigenvalues of the A matrix
2 i 1
are stored. It is of vital importance here that we do not have to update eigenvalues, because it is
computationally very demanding and it would make the algorithm hopeless to reach convergence.
The challenge in the implementation of the MCMC algorithm is the very high dimension.
We have to reach convergence and mixing in a 3112 dimensional state space. All the papers
mentioned in the introduction addressed a significantly lower dimensional problems and their
publicly available programs did not seem to work or did not produce result within acceptable
computer runtime (up to several hours or a night on a 2.6 GHz PC). Statistical inference for
Gaussian random fields routinely done by MCMC simulation methods, but mostly by single-site
updating, that is updating each parameter one by one in turn. However, it is well known that such
a single-site updating can have very poor convergence and mixing properties. Several authors
have therefore suggested to block update the parameters (for details on block updates see e.g.
Knorr-Held and Rue (2002 and references therein). Though block updating has relatively rarely
been considered so far in spatial models, we apply it here for . In the acceptance-rejection step
a computationally very demanding operation is to calculate the quadratic form in the logposterior, as A is a 31113111 matrix. As it consists of zeros and ones only, its role is nothing
but to say for which pair of indices should i  j appear in the sum expressing the quadratic form
as  A 
T
  
i , j: Ai , j 1
i
j
. Once the pairs of indices are stored the computation can be substantially
fastened – we experienced 4 times higher speed in computing. A sparse matrix decomposition
may provide an alternative way for fast computation.
We simulated 200000 times from the chains, and obtained acceptance rates 27% for 
and 34.5% for the  vectors. The trace plots of  and 1 together with the ACF for  the
conditional max-likelihood for  and the log-posterior are given for every 20th iteration (except
for the ACF, that is computed for all iterations) in Fig.2.
ACF of rho
0.0
5 e-04
0.4
0.8
8 e-04
Simulated chain for rho
2000
4000
6000
8000
10000
0
10
20
30
40
all values after burn in
Simulated chain for theta[1]
Estimated values of lambda
50
30000
every 20-th value
0
2000
4000
6000
every 20-th value
8000
10000
28000
26000
-3
2.5
3.5
-2 -1
0
4.5
1
5.5
0
0
2000
4000
6000
every 20-th value
8000
10000
0
2000
4000
6000
8000
10000
Log-posterior
Figure 2. Trace plots of parameters, ACF of  and the log-posterior.
The relatively poor mixing can be noted for 1 and the situation is not better for the other
coordinates of , but that is an effect of the high dimension. The 20th simulated  values are
practically uncorrelated. Although it seems as if 50000 iterations would be enough for burn in we
regaded the first 100000 as such, and computed mean estimates for the parameters from the
second half of the simulated data. This gives the estimation of the intensities in the Poisson
model, by the help of which the expected number of claims can be computed. As Fig. 3. shows,
the difference between the actually observed and the expected from the model is surprisingly low,
it does not exceed 3 in absolute values, and it is a bit positively skewed. This difference, the
model residuals, give the way to produce again a probability map, displayed in Fig.3, in order to
check for spatial irregularities in the residuals.
48.5
3000
0
500
1000 1500 2000
2500 3000
-2
-2
-1
-1
0
46.0
0
1
1
2
46.5
2
3
Observed claims at localities
47.5
2500 3000
47.0
1000 1500 2000
Expected claims at localities
prmaps[, 2]
500
48.0
0 1000
3000
0 1000
0
0
500
1000 1500 2000
2500 3000
The spatial parameters theta
0
500
1000 1500 2000
2500 3000
Residuals=observed-expected
16
17
18
19
20
21
22
23
prmaps[, 1]
Figure 3. Observed and expected (by the means of simulated parameters) claims numbers, the mean log
relative risk, the residuals and the resulted probability map
The obtained map is almost completely homogeneous, at least it is lacking any spatial
structure, ensuring that the spatial parameters we found describe the spatial dependence of the
risk. This relative risk is therefore a suitable indicator for the basis of geopricing in insurance.
ACKNOWLEDGEMENT:
The authors are indebted to Charles C. Taylor from Leeds, U.K. for his suggestions
concerning the model and the valuable discussions on the topic. This research was partially
supported by the Hungarian National Research Fund OTKA, grant No.: T047086.
REFERENCES
Besag, J., York, J., Molliè, A., 1991: Bayesian image restoration with two applications in spatial
statistics (with discussion). Ann. Inst. Stat. Math. 43, 1–59.
Boskov, M., Verrall, R.J., 1994: Premium rating by geographic area using spatial models.
ASTIN Bulletin. 24, 131–143.
Brouhns, N., Denuit, M., Masuy, B., Verrall, R., 2002: Ratemaking by geographical area: A case
study using the Boskov and Verrall model, Discussion paper 0202, Publications of the Institut de
statistique, Louvain-la-Neuve, 1-26.
Denuit, M,, Stefan Lang S. 2004: Non-life rate-making with Bayesian GAMs, Insurance:
Mathematics and Economics Vol. 35, 627–647.
Dimakos, X.K., Frigessi di Rattalma, A., 2002: Bayesian premium rating with latent structure.
Scandinavian Actuarial Journal 162–184.
Green, P. J., Richardson, S., 2002: Hidden Markov Models and Disease Mapping, Journal of the
American Statistical Association Vol. 97, No. 460, 1055-1070.
Knorr-Held, L., Rue, H., 2002: On block updating in Markov random field models for disease
mapping, Scandinavian Journal of Statistics, Vol. 29. 597-614.
Download