2007 CAS Annual Meeting Estimating Loss Cost at the Address Level Glenn Meyers ISO Innovative Analytics Territorial Ratemaking Territories should be big – Have a sufficient volume of business to make credible estimates of the losses. Territories should be small – “You live near that bad corner!” – Driving conditions vary within territory. Some Environmental Features Related to Auto Accidents Proximity to Business Districts – Workplaces Busy at beginning and end of work day – Shopping Centers Always busy (especially on weekends) – Restaurants Busy at mealtimes – Schools Busy and beginning and end or school day Some Environmental Features Related to Auto Accidents Weather – Rainfall – Temperature – Snowfall (especially in hilly areas) Traffic Density – More traffic sharing the same space increases odds of collision Others Combining Environmental Variables at a Particular Garage Address Individually, the geographic variables have a predictable effect on accident rate and severity. Variables for a particular location could have a combination of positive and negative effects. ISO is building a model to calculate the combined effect of all variables. – Based on countrywide data – Actuarially credible View as Case Study in Model Development Reduction in number of variables – Necessary for small insurers Special circumstances in fitting models to individual auto data. Diagnostics – Graphic and Maps Economic value of lift Data Used in Building Model Obtained loss, exposure, classification and address for individual policies from cooperating insurers ISO Statistical Plan data Third-Party Data – Traffic – Business Location – Demographic – Weather – etc Approximately 1,000 indicators Environmental Module Examples Comprised of over 1000 indicators Weather: – Measures of snowfall, rainfall, temperature, wind and elevation Traffic Density and Driving Patterns: – – – – Commute patterns Public transportation usage Population density Types of housing Traffic Composition – Demographic groups – Household size – Homeownership Traffic Generators – – – – Transportation hubs Shopping centers Hospitals/medical centers Entertainment districts Experience and trend: – ISO loss cost – State frequency and severity trends from ISO lost cost analysis Techniques Employed in Variable Reduction Variable Selection – univariate analysis, transformations, known relationship to loss Sampling Sub models/data reduction – neural nets, splines, principal component analysis, variable clustering Spatial Smoothing – with parameters related to auto insurance loss patterns In Depth for Weather Component Environmental Model Loss Cost by Coverage Coverage Frequency × Severity Frequency Severity Causes of Loss Frequency Traffic Generators Traffic Composition Weather Traffic Density Experience and Trend Sub Model Neural Net Weather Model 1 Weather Severity Scale 1 Temperature Model Weather Severity Scale 2 Neural Net Weather Model 2 Data Summary Variable Weather Summary Variables Raw Data 35 Years of Weather Data Environmental Model Loss Cost = Pure Premium = Frequency x Severity e Frequency = 1 e = Intercept + Weather + Traffic Density + Traffic Generators + Traffic Composition + Experience and Trend Environmental Model Loss Cost = Pure Premium = Frequency x Severity = Intercept Severity = e + Weather + Traffic Density + Traffic Generators + Traffic Composition + Experience and Trend Environmental Model Loss Cost = Pure Premium = Frequency x Severity Separate Models by Coverage – – – – – Bodily Injury Liability No-Fault Property Damage Liability Collision Comprehensive Constructing the Components Frequency Model as Example Intercept 1 x1 ... n1 xn1 = Weather n11 xn11 ... n2 xn2 = Traffic Density n2 1 xn2 1 ... n3 xn3 = Traffic Generators n3 1 xn3 1 ... n4 xn4 = Traffic Composition n4 1 xn4 1 ... n5 xn5 = Experience & Trend Other Classifiers Constructing the Components Frequency Model as Example “Other Classifiers” reflect driver, vehicle, limits and deductibles. Model output is deployed to a base class, standard limits and deductibles. Problems in Fitting Models Sample records with no losses – Most records have no losses – Attach sample rate, si, to retained records – Lore is to have equal number of loss records and no loss records in the sample. Policy exposure, ti, varies – Most are 6 month or 12 month policies Need to account for sampling and exposure in building model 1 (1 p1i ) ti 1 (1 t i p1i o( p12i )) t i p1i Sampling and Exposure in Logistic Regression pi = annual probability ni = 1 if claim, 0 if not ti = policy term si = sample rate Likelihood ( 1 ( 1 pi )ti )si ni ( 1 pi )ti si ( 1ni ) i For pi <<1 1 ( 1 pi )ti 1 ( 1 t i pi o( pi2 )) t i pi Loglikelihood si ni ln( t i ) i si ni ln( pi ) t i si ( 1 ni )ln( 1 pi ) i 1 (1 p1i ) ti 1 (1 t i p1i o( p12i )) t i p1i Sampling and Exposure in Logistic Regression Loglikelihood si ni ln( t i ) i si ni ln( pi ) t i si ( 1 ni )ln( 1 pi ) i i e In Logistic Regression pi = 1 e i Loglikelihood w i ni ln( pi ) w i ( 1 ni )ln( 1 pi ) i Set wi = si if ni = 1 Set wi = tisi if ni = 0 Overall Model Diagnostics Results are preliminary Sort in order of increasing prediction – Frequency & Severity Group observations in buckets – 1/100th of record count for frequency – 1/50th of the record count for severity Calculate bucket averages Apply the GLM link function for bucket averages and predicted value – logit for frequency – log for severity Plot predicted vs empirical – With confidence bands Overall Diagnostics - Frequency Empirical vs. Predicted Probabilities: BI (On logistic scales) -3 p logit ln 1 p empirical.logit -4 -5 -6 -7 -8 -8 -7 -6 -5 predicted.logit -4 -3 Overall Diagnostics - Severity Empirical vs. Predicted Log (Base 10) Severities: BI 4.6 empirical.logsev 4.4 4.2 4.0 3.8 3.6 3.7 3.9 4.1 predicted.logsev 4.3 4.5 Component Diagnostics Frequency Example Sort observations in order of Ci Bucket as above and calculate – Cib = Average Ci in bucket b – pib = Average pi in bucket b – Partial Residuals pib Rib ln Ckb k i 1 pib Plot Cib vs Rib – Expect linear relationship Component Diagnostics Experience and Trend Logit Partial Residuals vs. Components: Comprehensive logit.partial.residual 1.0 0.5 0.0 -0.5 -1.0 -0.6 -0.1 0.4 Exp 0.9 Component Diagnostics Traffic Composition Logit Partial Residuals vs. Components: Comprehensive 0.4 logit.partial.residual 0.2 0.0 -0.2 -0.4 -0.16 -0.11 -0.06 -0.01 0.04 TrafComp 0.09 0.14 0.19 Component Diagnostics Traffic Density Logit Partial Residuals vs. Components: Comprehensive logit.partial.residual 0.3 0.1 -0.1 -0.3 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 TrafDen 0.1 0.2 0.3 Component Diagnostics Traffic Generators Logit Partial Residuals vs. Components: Comprehensive 0.3 logit.partial.residual 0.1 -0.1 -0.3 -0.5 -0.26 -0.21 -0.16 -0.11 -0.06 TrafGen -0.01 0.04 0.09 Component Diagnostics Weather Logit Partial Residuals vs. Components: Comprehensive 0.5 logit.partial.residual 0.3 0.1 -0.1 -0.3 -0.5 -0.4 -0.2 0.0 Weather 0.2 Comparing Model Output to Current Loss Costs Model output is deployed to a base class, standard limits and deductibles. – Similar to current loss cost, but at garaging address rather than territory. Define: Model Output Relativity Current Loss Cost Relativity is proportional to premium that could be charged with “refined loss costs” using the model output. Relativities to Current Loss Costs 20 40 0 % Premium BI Relativity 0.7 0.8 0.9 1 1.1 1.2 1.3 1.1 1.2 1.3 1.1 1.2 1.3 1.1 1.2 1.3 Relativity 50 0 20 % Premium PD Relativity 0.7 0.8 0.9 1 Relativity 25 0 10 % Premium Comp Relativity 0.7 0.8 0.9 1 Relativity 0 20 40 % Premium Collision Relativity 0.7 0.8 0.9 1 Relativity Newark NJ Area Combined Relativity Passaic Cedar Grove 8" 8" W est Caldwell Ridgef ield Park 8" Little Ferry Palisades P ark 8" W allington 8" 8" Verona Ridgef ield Rutherford 8" 8" Montclair Nutley 8" Fairview 8" Lyndhurs t 8" Bloomfield 8" Livingston Belleville North A rlington 8" W est Orange 8" 8" 8" Orange East Orange 8" Union C ity 8" 8" South O range Harris on 8" Hoboken 8" 8" Newark Millburn 8" 8" Maplewood Irv ington 8" Jers ey C ity 8" 8" Summit 8" Springf ield 8" Union Hills ide 8" 8" Roselle Park Cranford 8" 8" S cotch Plains Elizabeth 8" Roselle 8" Linden 8" Clark 8" 8" 8" Kearny 8" W estfield Guttenberg Secaucus 8" Bayonne 8" W est New York Evaluating the Lift of the Environmental Model Demonstrate the ability to select the more profitable risks Demonstrate the adverse effect of competitors “skimming the cream” Calculate the “Value of Lift” statistic Once insurers see the value of lift other actions are possible – Change prices (etc) Effect of Selecting Lower Relativities 5 4 3 2 0 75 80 85 90 95 75 80 85 90 95 Selective Underwriting for Comp Selective Underwriting for Coll 5 4 3 2 1 0 1 2 3 4 5 % Decrease in Loss Ratio 6 % Premium Selected 6 % Premium Selected 0 % Decrease in Loss Ratio 1 % Decrease in Loss Ratio 5 4 3 2 1 0 % Decrease in Loss Ratio 6 Selective Underwriting for PD 6 Selective Underwriting for BI 75 80 85 90 % Premium Selected 95 75 80 85 90 % Premium Selected 95 Effect of Competitors Selecting Lower Relativities 8 6 4 0 10 20 30 40 50 10 20 30 40 50 Antiselection for Comprehensive Antiselection for Collision 8 6 4 2 0 2 4 6 8 % Increase in Loss Ratio 10 % Premium Lost to Competition 10 % Premium Lost to Competition 0 % Increase in Loss Ratio 2 % Increase in Loss Ratio 8 6 4 2 0 % Increase in Loss Ratio 10 Antiselection for PD 10 Antiselection for BI 10 20 30 40 50 % Premium Lost to Competition 10 20 30 40 50 % Premium Lost to Competition Assumptions of The Formula Value of Lift (VoL) Assume a competitor comes in and takes away the business that is less than your class average. Because of adverse selection, the new loss ratio will be higher than the current loss ratio. What is the value of avoiding this fate? VoL is proportional to the difference between the new and the current loss ratio. Express the VoL as a $ per car year. The VoL Formula LC = Current losses PC = Current Loss Cost LN = New losses of business remaining After adverse selection PN = New Loss Cost After adverse selection EC = Current exposure in car years The VoL Formula LN LC P P PN N C VoL EC The numerator represents $ value of the potential cost of competitors skimming the cream. Dividing by EC expresses this value as a $ value per car year. Value of Lift Results BI PD Comprehensive Collision Total VoL $ 5.32 2.84 2.23 2.10 $12.49 VoL % of Loss Cost 3.23% 2.39% 5.26% 1.84% Customized Model Loss Cost = Pure Premium = Frequency x Severity e Frequency = 1 e 1 … 5 ≡ 1 in industry model Severity model customized similarly = 0 + 1 Weather + 2 Traffic Density + 3 Traffic Generators + 4 Traffic Composition + 5 Experience and Trend + Other Classifiers Summary Model estimates loss cost as a function of business, demographic and weather conditions. Demonstrated model diagnostics Demonstrated lift Indicated how to customize the model