PL-2 An Introduction to GLM Theory 2006 CAS Seminar on Ratemaking Claudine Modlin, FCAS Watson Wyatt Worldwide WWW.WATSONWYATT.COM Generalized linear model benefits Consider all factors simultaneously Allow for nature of random process Provide diagnostics Robust and transparent Example of GLM output 0.25 22% 180 0.2 160 0.15 10% 0.1 7% 6% 140 0.05 0% Log of multiplier -4% -5% -0.05 100 -0.1 -15% -16% -0.15 -19% -17% 80 -19% -20% -0.2 60 -0.25 40 -0.3 -0.35 20 -0.4 -0.45 0 1 2 3 4 5 6 Factor Exposure Oneway relativities Approx 2 SE from estimate GLM estimate 7 Exposure (policy years) 120 0 Agenda Formularization of GLMs – linear predictor, link function, offset – error term, scale parameter, prior weights – typical model forms Model testing – use only variables which are predictive – make sure model is reasonable Aliasing Agenda Formularization of GLMs – linear predictor, link function, offset – error term, scale parameter, prior weights – typical model forms Model testing – use only variables which are predictive – make sure model is reasonable Aliasing Linear models Linear model Yi = i + error i based on linear combination of measured factors Which factors, and how they are best combined is to be derived i = + .agei + .agei2 + .heighti.agei i = + .agei + .(sexi=female) i = ( + .agei) * exp(.heighti.agei) Linear models - formularization E[Yi] = i = SXijj Var[Yi] = 2 s Yi ~ N(i, 2 s) What is SXijj? X defines the explanatory variables to be included in the model – could be continuous variables - "variates" – could be categorical variables - "factors" contains the parameter estimates which relate to the factors / variates defined by the structure of X – "the answer" What is X. ? Write SXijj as X. Consider 3 rating factors - age of driver ("age") - sex of driver ("sex") - age of vehicle ("car") Represent by , , , , ... What is X. ? Suppose we wanted a model of the form: = + .age + .age2 + .car27.age52½ X. would need to be defined as: 1 age1 age12 car127.age152½ 1 age2 age22 car227.age252½ 1 age3 age32 car327.age352½ 1 age4 age42 car427.age452½ 1 age5 age52 car527.age552½ ................................ ................................ What is X. ? Suppose we wanted a model of the form: = + 1 if age < 30 + 2 if age 30 - 40 + 3 if age > 40 + 1 if sex male + 2 if sex female What is X. ? 1 2 3 4 5 Age Sex <30 30-40 >40 M F 1 010 10 1 100 10 1 100 01 1 001 10 1 010 01 .......................... .......................... 1 2 3 1 2 What is X. ? Suppose we wanted a model of the form: = + 1 if age < 30 + 2 if age 30 - 40 + 3 if age > 40 + 1 if sex male + 2 if sex female What is X. ? Suppose we wanted a model of the form: = + 1 if age < 30 + 2 if age 30 - 40 "Base levels" + 3 if age > 40 + 1 if sex male + 2 if sex female X. having adjusted for base levels 1 2 3 4 5 Age Sex <30 30-40 >40 M F 1 010 10 1 100 10 1 100 01 1 001 10 1 010 01 .......................... .......................... 1 2 3 1 2 Linear models - formularization E[Yi] = i = SXijj Var[Yi] = 2 s Yi ~ N(i, 2 s) Generalized linear models i = f( + .agei + .agei2 + .heighti.agei ) i = f( + .agei + .(sexi=female) ) Generalized linear models i = g-1( + .agei + .agei2 + .heighti.agei ) i = g-1( + .agei + .(sexi=female) ) Generalized linear models Linear Models Generalized Linear Models E[Yi] = i = SXijj E[Yi] = i = g-1(SXijj + xi) Var[Yi] = s2 Var[Yi] = fV(i)/wi Y from Y from a distribution from the Normal distribution exponential family Generalized linear models Each observation i from distribution with mean i 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 Women Men Generalized linear models E[Yi] = i = -1 g (SX ij j j + xi ) Var[Yi] = f.V(i)/wi Generalized linear models E[Y] = = g ( X + x ) -1 Var[Y] = fV() / w Generalized linear models E[Y] = = g ( X -1 Some function (user defined) Observed thing (data) ) Parameters to be estimated (the answer!) Some matrix based on data (user defined) as per linear models What is g (X.) ? -1 -1 Y = g (X.) + error Assuming a model with three categorical factors, each observation can be expressed as: Yijk= g ( + i + j + k ) + error -1 age is in group i = =0 = 2 1 3 sex is in group j car is in group k What is g (X.) ? -1 g(x) = x Yijk = + i + j + k + error ( + i + j + k) g(x) = ln(x) Yijk = e = A.Bi.Cj.Dk + error + error where Bi = ei etc Multiplicative form common for frequency and amounts Multiplicative model $ 207.10 x Age 17 18 19 20 21-23 24-26 27-30 31-35 36-40 41-45 46-50 50-60 60+ Factor 2.52 2.05 1.97 1.85 1.75 1.54 1.42 1.20 1.00 0.93 0.84 0.76 0.78 Group 1 2 3 4 5 6 7 8 9 10 11 12 13 Factor 0.54 0.65 0.73 0.85 0.92 0.96 1.00 1.08 1.19 1.26 1.36 1.43 1.56 Sex Male Female Factor 1.00 1.25 Area Factor A 0.95 B 1.00 C 1.09 D 1.15 E 1.18 F 1.27 G 1.36 H 1.44 E(losses) = $ 207.10 x 1.42 x 0.92 x 1.00 x 1.15 = $ 311.14 Generalized linear models E[Y] = = g ( X + x ) -1 "Offset" Eg Y = claim numbers Smith: Male, 30, Ford, 1 years, 2 claims Jones: Female, 40, VW, ½ year, 1 claim What is x ? g(x) = ln(x) xijk = ln(exposureijk) ( + i + j + k + xijk) E[Yijk] = e (ln(exposureijk)) = A.Bi.Cj.Dk. e = A.Bi.Cj.Dk. exposureijk Restricted models E[Y] = = g ( X + x ) -1 Offset Constrain model (eg increased limits, territory, amount of insurance, discounts) Other factors adjusted to compensate Generalized linear models E[Y] = = g ( X + x ) -1 Var[Y] = fV() / w Generalized linear models Var[Y] = fV() / w Normal: f = s2, V(x) = 1 Var[Y] = s2.1 Poisson: f = 1, V(x) = x Var[Y] = Gamma: f = k, V(x) = x Var[Y] = k 2 2 The scale parameter Var[Y] = fV() / w Normal: f = s2, V(x) = 1 Var[Y] = s2.1 Poisson: f = 1, V(x) = x Var[Y] = Gamma: f = k, V(x) = x Var[Y] = k 2 2 Example of effect of changing assumed error - 1 10 8 6 4 2 0 1 2 Data Normal 3 Example of effect of changing assumed error - 1 10 8 6 4 2 0 1 2 Data Normal Poisson 3 Example of effect of changing assumed error - 1 10 8 6 4 2 0 1 2 Data Normal Poisson Gamma 3 Example of effect of changing assumed error - 2 Example portfolio with five rating factors, each with five levels A, B, C, D, E Typical correlations between those rating factors Assumed true effect of factors Claims randomly generated (with Gamma) Random experience analyzed by three models Example of effect of changing assumed error - 2 0.6 0.5 Log of multiplier 0.4 0.3 0.2 0.1 0 A B C -0.1 -0.2 True effect D E Example of effect of changing assumed error - 2 0.6 0.5 Log of multiplier 0.4 0.3 0.2 0.1 0 A B C D -0.1 -0.2 True effect One way E Example of effect of changing assumed error - 2 0.6 0.5 Log of multiplier 0.4 0.3 0.2 0.1 0 A B C D -0.1 -0.2 True effect One way GLM / Normal E Example of effect of changing assumed error - 2 0.6 0.5 Log of multiplier 0.4 0.3 0.2 0.1 0 A B C D -0.1 -0.2 True effect One way GLM / Normal GLM / Gamma E Example of effect of changing assumed error - 2 0.6 0.5 Log of multiplier 0.4 0.3 0.2 0.1 0 A B C D -0.1 -0.2 True effect GLM / Gamma + 2 SE - 2 SE E Prior weights Var[Y] = fV() / w Exposure Other credibility Eg Y = claim frequency Smith: Male, 30, Ford, 1 years, 2 claims, 100% Jones: Female, 40, VW, ½ year, 1 claim, 100% Typical model forms Y Claim frequency Claim number Average claim amount g(x) ln(x) ln(x) ln(x) ln(x/(1-x)) Error Poisson Poisson Gamma Binomial f V(x) 1 x 1 x estimate x2 1 x(1-x) w exposure 1 # claims 1 x 0 ln(exposure) 0 0 Probability (eg lapses) Tweedie distributions 0.2 Incurred losses have a point mass at zero and then a continuous distribution Poisson and gamma not suited to this Tweedie distribution has point mass and parameters which can alter the shape to be like Poisson and gamma above zero f Y ( y; , , ) n 1 (w ) 0 (1 / y ) . exp w[ 0 y ( 0 )] for y 0 (n )n! y 1 p(Y 0) exp w ( 0 ) n Generalized linear models Var[Y] = fV() / w Normal: f = s , V(x) = 1 Var[Y] = s .1 2 2 Poisson:f = 1, V(x) = x Var[Y] = Gamma:f = k, V(x) = x Var[Y] = k 2 2 Tweedie:f = k, V(x) = x Var[Y] = k p p Tweedie distributions Tweedie:f = k, V(x) = x Var[Y] = k p p Defines a valid distribution for p<0, 1<p<2, p>2 Can be considered as Poisson/gamma process for 1<p<2 Typical values of p for insurance incurred claims around, or just under, 1.5 Tweedie distributions Helpful when important to fit to pure premium Often similar results to traditional approach but differences may occur if numbers and amounts models have effects which are both large and insignificant No information about whether frequencies or amounts are driving result Offset x Link function g(x) Error Structure V() Linear Predictor Form X. = i + j + k + l Scale Parameter f Data Y Prior Weights w Numerical MLE Parameter Estimates , , , Diagnostics Maximum likelihood estimation Seek parameters which give highest likelihood function given data Parameter 1 Likelihood Parameter 2 Newton-Raphson In one dimension: xn+1 = xn – f '(xn) / f ''(xn) In n dimensions: n+1= n – H-1.s where is the vector of the parameter estimates (with p elements), s is the vector of the first derivatives of the log-likelihood and H is the (p*p) matrix containing the second derivatives of the log-likelihood Agenda Formularization of GLMs – linear predictor, link function, offset – error term, scale parameter, prior weights – typical model forms Model testing – use only variables which are predictive – make sure model is reasonable Aliasing Model testing Use only those variables which are predictive – – – standard errors of parameter estimates F tests / c2 tests on deviances consistency with time Make sure the model is reasonable – histogram of deviance residuals – residual vs fitted value Standard errors Roughly speaking, for a parameter p: SE = -1 / (2/ p2 Likelihood) Likelihood Parameter 1 Parameter 2 GLM output (significant factor) 1.2 200000 180000 1 154% 138% 0.8 160000 105% 140000 84% 73% 0.6 120000 72% 58% 100000 45% 0.4 39% 80000 31% Exposure (years) Log of multiplier 93% 60000 0.2 5% 40000 0% 0 20000 -0.2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Vehicle symbol P value = 0.0% Onew ay relativities Approx 95% confidence interval Parameter estimate GLM output (insignificant factor) 0.2 200000 180000 0.15 160000 0.1 7% 3% 3% 0% 120000 0% -1% 0 -5% -1% -1% -1% -5% 100000 80000 -0.05 60000 -0.1 40000 -0.15 20000 -0.2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Vehicle symbol P value = 52.5% Onew ay relativities Approx 95% confidence interval Parameter estimate Exposure (years) 4% 0.05 Log of multiplier 140000 4% Awkward cases 2 1.8 1.6 1.4 200% 1.2 159% Log of multiplier 1 123% 0.8 92% 101% 65% 0.6 42% 0.4 16% 11% 0.2 0 -10% 0% -5% -10% -14% -22% -0.2 -0.4 -0.6 -0.8 -1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Deviances Single figure measure of goodness of fit Try model with & without a factor Statistical tests show the theoretical significance given the extra parameters Deviances Age Sex Vehicle Model A Zone Fitted value Deviance = 9585 df = 109954 Mileage ? Claims Age Sex Vehicle Model B Mileage Claims Fitted value Deviance = 9604 df = 109965 Deviances If f known, scaled deviance S output n S= S u=1 Yu 2 wu / f (Yu - ) / V() d S1 - S2 ~ c u 2 d1-d2 If f unknown, unscaled deviance D = f.S output (D1 - D2) (d1 - d2) D3 / d3 ~ Fd -d , d 1 2 3 Consistency over time 1.2 100000 185% 1 150% 149% 140% 80000 118% 104% Log of multiplier 97% 81% 73% 0.6 0.4 33% 27% 45% 38% 34% 51% 44% 40% 78% 76% 64% 63% 56% 81% 108% 105% 104% 89% 86% 60000 69% 65% 40000 Exposure (years) 0.8 0.2 0 4% 0% -4% 10% 5% 2% 20000 -0.2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Vehicle symbol.Year of exposure Approx 95% confidence interval, Year of exposure: 2000 Approx 95% confidence interval, Year of exposure: 2001 Approx 95% confidence interval, Year of exposure: 2002 Parameter estimate, Year of exposure: 2000 Parameter estimate, Year of exposure: 2001 Parameter estimate, Year of exposure: 2002 Consistency over time 0.8 95% 0.7 100000 0.6 0.5 49% Log of multiplier 29% 0.3 0.2 0.1 14% 2% 30% 26% 16% 60000 4% 3% 0% 2% 0 -9% -13% -0.1 -0.2 -21% -23% -15% Exposure (years) 80000 0.4 40000 -15% -20% -22% -25% 20000 -0.3 -0.4 -0.5 0 1 2 3 4 5 6 7 Territory.Year of exposure Approx 95% confidence interval, Year of exposure: 2000 Approx 95% confidence interval, Year of exposure: 2001 Approx 95% confidence interval, Year of exposure: 2002 Smoothed estimate, Year of exposure: 2000 Smoothed estimate, Year of exposure: 2001 Smoothed estimate, Year of exposure: 2002 Model testing Use only those variables which are predictive – – – standard errors of parameter estimates F tests / c2 tests on deviances interaction with time Make sure the model is reasonable – histogram of deviance residuals – residual vs fitted value Residuals 4.5 4 3.5 3 2.5 2 1.5 0 1 2 3 4 Residuals Fitted values 5 Data 6 7 Residuals Several forms, eg – standardized deviance sign (Yu-u) / ( f(1-hu) ) ½ Yu 2 wu ( Yu- ) / V()d u – standardized Pearson Yu - u ( f.V( u ).(1-h u ) / wu ) ½ Standardized deviance - Normal (0,1) Numbers/frequency residuals problematical Residuals Histogramof Deviance Residuals Run 12 (Final models with analysis) Model 8 (AD amounts) 6000 5000 Frequency 4000 3000 2000 1000 Pretium 08/01/2004 12:24 5.25 5 4.5 4.75 4 4.25 3.5 3.75 3 3.25 2.5 2.75 2 2.25 1.5 Size of deviance residuals 1.75 1 1.25 0.5 0.75 0 0.25 -0.25 -0.5 -0.75 -1 -1.25 -1.5 -1.75 -2 -2.25 -2.5 -2.75 -3 -3.25 -3.5 -3.75 -4 -4.25 -4.5 0 Residuals 4 3 2 Residual 1 0 -1 -2 -3 -4 5.6 5.7 5.8 Pretium 04/05/2004 11:03 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 Log of fitted value 6.8 6.9 7.0 7.1 7.2 7.3 7.4 7.5 Residuals Gamma data, Gamma error Plot of deviance residual against fitted value Run 12 (All claim types, final models, N&A) Model 6 (Own damage, Amounts) 2 Deviance Residual 1 0 -1 -2 -3 500 600 700 800 900 1000 1100 1200 1300 1400 Fitted Value Pretium 08/01/2004 12:32 1500 1600 1700 1800 1900 2000 2100 2200 Gamma data, Normal error Plot of deviance residual against fitted value Run 12 (All claim types, final models, N&A) Model 7 (Own damage, Amounts) 6000 5000 4000 Deviance Residual 3000 2000 1000 0 -1000 -2000 500 600 700 800 900 1000 1100 1200 1300 1400 1500 Fitted Value Pretium 08/01/2004 12:32 1600 1700 1800 1900 2000 2100 2200 2300 Agenda Formularization of GLMs – linear predictor, link function, offset – error term, scale parameter, prior weights – typical model forms Model testing – use only variables which are predictive – make sure model is reasonable Aliasing Aliasing and "near aliasing" Aliasing – Intrinsic aliasing – the removal of unwanted redundant parameters occurs by the design of the model Extrinsic aliasing – occurs "accidentally" as a result of the data Intrinsic aliasing X. = + 1 if age 20 - 29 + 2 if age 30 - 39 + 3 if age 40 + + 1 if sex male + 2 if sex female "Base levels" Intrinsic aliasing Example job Run 16 Model 3 - Small interaction - Third party material damage, Numbers 1.2 250000 155% 138% 200000 Log of multiplier 0.8 0.6 63% 150000 0.4 28% 24% 100000 0.2 0% -6% 0 -11% 50000 -19% -0.2 -0.4 0 17-21 22-24 25-29 30-34 35-39 40-49 50-59 60-69 Age of driver Onew ay relativities Approx 95% confidence interval Unsmoothed estimate Smoothed estimate 70+ Exposure (years) 1 Extrinsic aliasing If a perfect correlation exists, one factor can alias levels of another Eg if doors declared first: Exposure: # Doors 2 Color 13,234 Selected base Red Further aliasing Selected base 3 4 5 Unknown 12,343 13,432 13,432 0 Green 4,543 4,543 13,243 2,345 0 Blue 6,544 5,443 15,654 4,565 0 Black 4,643 1,235 14,565 4,545 0 Unknown 0 0 0 0 3,242 This is the only reason the order of declaration can matter (fitted values are unaffected) Extrinsic aliasing Example job Run 16 Model 3 - Small interaction - Third party material damage, Numbers 1 135% 0.8 600000 103% 500000 91% 72% Log of multiplier 0.6 70% 400000 57% 45% 0.4 39% 300000 30% 0.2 200000 0% 0% 0 100000 -0.2 0 50 55-59 60-64 65-69 70-74 75-79 80-84 85-89 90-94 95-99 No claim discount Onew ay relativities Approx 95% confidence interval Unsmoothed estimate Smoothed estimate 100 Malus Exposure (years) 82% "Near aliasing" If two factors are almost perfectly, but not quite aliased, convergence problems can result as a result of low exposures (even though one-ways look fine), and/or results can become hard to interpret Selected base Exposure: # Doors 2 Color Selected base Red 13,234 3 4 5 Unknown 12,343 13,432 13,432 0 Green 4,543 4,543 13,243 2,345 0 Blue 6,544 5,443 15,654 4,565 0 Black 4,643 1,235 14,565 4,545 2 Unknown 0 0 0 0 3,242 Eg if the 2 black, unknown doors policies had no claims, GLM would try to estimate a very large negative number for unknown doors, and a very large positive number for unknown color "A Practitioner's Guide to Generalized Linear Models" CAS 2004 Discussion Paper Program CAS Exam 9 syllabus as of 2006 Copies available at www.watsonwyatt.com/glm PL-2 An Introduction to GLM Theory 2006 CAS Seminar on Ratemaking Claudine Modlin, FCAS Watson Wyatt Worldwide WWW.WATSONWYATT.COM