UEA Insurance Stats Overview Steve Cant Senior Statistics Manager, Aviva steve.cant @ aviva.co.uk 01603 686857 The Elusive Advert - extracts Modelling Opportunities at Aviva Does modelling risk have any appeal ? Are you interested in an actuarial career but don’t fancy years more study in your free time ? Do you have graduate level maths skills ? Do you have any idea what financial statisticians do ? Have you ever wondered how insurance premiums are calculated ? Key Aspects of Role Building risk cost models – to predict who will claim on their motor or household insurance Behavioural modelling – to predict how customers will react to pricing decisions Spatial analysis of postcode area in order to produce world leading maps of insurance risk Extraction of deeper knowledge from large, already well understood data sets R & D into new modelling and analytical techniques Educated guesswork Working with colleagues across the business including those in pricing, marketing, finance, actuarial, claims and underwriting You’ll apply your analytical enthusiasm to a range of business problems and produce mathematical and statistical models that drive real results. Products (Personal) Motor Bike Van Household Pet Travel Breakdown Creditor PRICING PROCESS Model Data DATA STATS RISK MODEL Cleanse Data DATA / STATS BURNING COST + expenses + commission + profit MASS CUSTOMISED PREMIUM MODEL ACTUARIAL Recalibration CHANNELS / FINANCE / UNDERWRITING CORE RATES Behavioural Models Competitive positioning Profitability Reviews Price Optimisation STATS PRICING TEAMS LIVE PREMIUMS Maintenance STREET RATES EDD Death AM80 and AF80 2 years select q[x-t]+t 0.004500 0.004000 0.003500 0.003000 0.002500 0.002000 0.001500 0.001000 0.000500 0.000000 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 Age Large Bodily Injury Claims – Major crashes THIRD PARTY ONLY 8.0 7.0 About 1 in 5000 vehicle years Relative Frequency 6.0 5.0 FE MA 4.0 3.0 2.0 About 1 in 40000 vehicle years 1.0 Main Driver Age 10 years of data > £250K frequency Still overwhelmingly random and rare – but can produce an index Anything that influences risk is a rating factor 89 87 85 83 81 79 77 75 73 71 69 67 65 63 61 59 57 55 53 51 49 47 45 43 41 39 37 35 33 31 29 27 25 23 21 19 17 0.0 Motor Insurance Rating Factors Postcode Vehicle MOTOR RISK COST MODEL – Ranked by Information Gain 1 Bodily Injury Freq District 14 Property Damage Freq District 2 Young Additional Driver Age 15 Own Damage District 3 NCD 16 Vehicle Age 4 Main Driver Age 17 Transmission 5 Young Additional Driver Sex 18 Theft Freq District 6 Car Group 19 Fuel Type 7 Ritz 20 Duration 8 Driving Restriction 21 Convictions 9 Payment Frequency 22 Licence Length 10 Make Model 23 YAD Owns Car 11 Ownership Length 24 PNCD 12 Mileage 25 Voluntary Excess 13 At Fault Claims etc 30 other factors Information gain is a weighted combination of factor range and exposure. E.g. age has high loadings for low exposure, payment method has lower loadings on high exposure. Insurance Premiums Start with a base (average) premium E.g. £400 (40 year old, 3 year old Ford Focus in Norwich, with full No Claims) Then add various loadings and discounts 18 year old driver 200% loading Lives in Liverpool 100% loading Drives a small car 40% discount Drives an old car 30% discount No Claims Discount is zero 233% loading ! (5 years No Claims is a 70% discount) £400 x 3 = £1200 x2 = £2400 x 0.6 = £1440 x 0.7 = £1008 x 3.33 = £3360 ! Harsh ? How do we calculate these loadings ? The Claims Universe CHAOS Undiscovered Order NewORDER factors Improved modelling Risk Modelling Modelling Process (Motor) 5 Perils: Accidental Damage, Bodily Injury, Theft, Glass, Property Damage 2 Models per peril: Frequency = S No. of Claims Severity = S Cost of Claim No. of Claims Exposure Exposure is the time on risk E.g. for 1000 cars, one year each this is 1000 ‘vehicle years’ 120 claims from these 1000 vehicle years => 120/1000 = 12% frequency But why bother risk modelling at all ? Multivariate Modelling Why bother ? Attempt to remove random effects (noise) Avoid the illusions of variable association (Simpson’s paradox) Consider all rating factors ‘together’ in order to discover ‘true effect’ Examine consistency over time Ensure best possible prediction of future risk Simpson’s Paradox Berkley Sex bias case (Source : Wikipedia) Breakdown by department 1973 Admission figures Bias against women ? Tables are OK for two factors, no use for 50 Linear Modelling Simple Linear Modelling LM expresses the relationship between an observed response (Y) and a set of predictors (X) 80 70 60 Cost 50 40 30 20 10 0 0 2 4 6 8 10 12 District In its simplest form (first order) it can be conceptualised as E(Y) = β0 + β1X Y = β0 + β1X + ε Where ε is an error term with expected value of 0 Linear Modelling Method of Least Squares In order to calculate estimates of the parameters β0 and β1 we use the method of least squares. This can be thought of as minimising the distance of each observed response yi is away from the predicted value ŷi. yi – ŷi x Remove Outliers We then extend this idea to n dimensions using matrices and Emblem software Linear Modelling Method of Least Squares Minimize the Sum of Squared Errors; By differentiating it can be shown that to minimise the SSE we must solve the following; Linear Modelling Multiple Linear Modelling What happens when we believe a number of variables affect the distribution of our random variable Y ? We still have the response variable Y but now instead of having a single predictor we have k predictors which we denote as X1, X2,.., Xk Now we want to fit the model So the same basic idea (least squares) but now we’re using matrix notation rather than simple algebra Matrix notation Generalized Linear Model (GLM) Basically An extension of Linear modelling that allows Multiplicative models (using a ‘link function’) - more appropriate for insurance A wider selection of errors (‘loss distributions’) from the exponential family Normal Distribution • assumes each observation has the same fixed variance (no tail) Poisson Distribution • assumes the variance increases with the expected value of each observation (longer tail) Gamma Distribution • assumes variance increases with the square of the expected value (even longer tail !) Emblem Software • Raw data alone can lead to the wrong conclusion Data Mining Decision Trees – Visual carve up of account Base 42% in Ritz 7-10 Age 50 + Annual payers 67% in Ritz 7-10 Exposure 27% Better Wealth Postcode Proportion Ritz 1- 6 73% in Ritz 7-10 Exposure 17% Proportion Ritz 7- 10 Ritz 7-10 Segmentation NUD (NB & Renewals) Policy Duration 3+ 76% in Ritz 7-10 Exposure 9% District – a quantum change in quality 2005 2010 X 10 Perils THE END Any questions ? © Aviva plc