2007 CAS Annual Meeting Estimating Loss Cost at the Address Level

advertisement
2007 CAS Annual Meeting
Estimating Loss Cost at the Address Level
Glenn Meyers
ISO Innovative Analytics
Territorial Ratemaking
Territories should be big
– Have a sufficient volume of business to make
credible estimates of the losses.
Territories should be small
– “You live near that bad corner!”
– Driving conditions vary within territory.
Some Environmental Features
Related to Auto Accidents
Proximity to Business Districts
– Workplaces
Busy at beginning and end of work day
– Shopping Centers
Always busy (especially on weekends)
– Restaurants
Busy at mealtimes
– Schools
Busy and beginning and end or school day
Some Environmental Features
Related to Auto Accidents
Weather
– Rainfall
– Temperature
– Snowfall (especially in hilly areas)
Traffic Density
– More traffic sharing the same space increases
odds of collision
Others
Combining Environmental Variables
at a Particular Garage Address
Individually, the geographic variables have a
predictable effect on accident rate and
severity.
Variables for a particular location could have
a combination of positive and negative
effects.
ISO is building a model to calculate the
combined effect of all variables.
– Based on countrywide data – Actuarially credible
View as Case Study in
Model Development
Reduction in number of variables
– Necessary for small insurers
Special circumstances in fitting models to
individual auto data.
Diagnostics
– Graphic and Maps
Economic value of lift
Data Used in Building Model
Obtained loss, exposure, classification and address
for individual policies from cooperating insurers
ISO Statistical Plan data
Third-Party Data
– Traffic
– Business Location
– Demographic
– Weather
– etc
Approximately 1,000 indicators
Environmental Module
Examples
Comprised of over 1000 indicators
Weather:
– Measures of snowfall, rainfall,
temperature, wind and elevation
Traffic Density and Driving
Patterns:
–
–
–
–
Commute patterns
Public transportation usage
Population density
Types of housing
Traffic Composition
– Demographic groups
– Household size
– Homeownership
Traffic Generators
–
–
–
–
Transportation hubs
Shopping centers
Hospitals/medical centers
Entertainment districts
Experience and trend:
– ISO loss cost
– State frequency and severity
trends from ISO lost cost analysis
Techniques Employed in
Variable Reduction
Variable Selection – univariate analysis,
transformations, known relationship to
loss
Sampling
Sub models/data reduction – neural nets,
splines, principal component analysis,
variable clustering
Spatial Smoothing – with parameters
related to auto insurance loss patterns
In Depth for Weather
Component
Environmental
Model Loss Cost
by Coverage
Coverage
Frequency
×
Severity
Frequency
Severity
Causes of Loss
Frequency
Traffic
Generators
Traffic
Composition
Weather
Traffic
Density
Experience
and Trend
Sub Model
Neural Net
Weather
Model 1
Weather
Severity
Scale 1
Temperature
Model
Weather
Severity
Scale 2
Neural Net
Weather
Model 2
Data Summary
Variable
Weather
Summary
Variables
Raw Data
35 Years of
Weather Data
Environmental Model
Loss Cost = Pure Premium
= Frequency x Severity

e
Frequency =
1  e
 = Intercept
+ Weather
+ Traffic Density
+ Traffic Generators
+ Traffic Composition
+ Experience and Trend
Environmental Model
Loss Cost = Pure Premium
= Frequency x Severity
 = Intercept
Severity = e

+ Weather
+ Traffic Density
+ Traffic Generators
+ Traffic Composition
+ Experience and Trend
Environmental Model
Loss Cost = Pure Premium
= Frequency x Severity
Separate Models by Coverage
–
–
–
–
–
Bodily Injury Liability
No-Fault
Property Damage Liability
Collision
Comprehensive
Constructing the Components
Frequency Model as Example
  Intercept
 1  x1  ...   n1  xn1
= Weather
  n11  xn11  ...   n2  xn2 = Traffic Density
  n2 1  xn2 1  ...   n3  xn3 = Traffic Generators
  n3 1  xn3 1  ...   n4  xn4 = Traffic Composition
  n4 1  xn4 1  ...   n5  xn5 = Experience & Trend
 Other Classifiers
Constructing the Components
Frequency Model as Example
“Other Classifiers” reflect driver, vehicle,
limits and deductibles.
Model output is deployed to a base class,
standard limits and deductibles.
Problems in Fitting Models
Sample records with no losses
– Most records have no losses
– Attach sample rate, si, to retained records
– Lore is to have equal number of loss records
and no loss records in the sample.
Policy exposure, ti, varies
– Most are 6 month or 12 month policies
Need to account for sampling and exposure in
building model
1  (1  p1i ) ti  1  (1  t i p1i  o( p12i ))  t i p1i
Sampling and Exposure
in Logistic Regression
pi = annual probability
ni = 1 if claim, 0 if not
ti = policy term
si = sample rate
Likelihood  ( 1  ( 1  pi )ti )si ni ( 1  pi )ti si ( 1ni )
i
For pi <<1
1  ( 1  pi )ti  1  ( 1  t i pi  o( pi2 ))  t i pi
Loglikelihood   si ni ln( t i )
i
 si ni ln( pi )  t i si ( 1  ni )ln( 1  pi )
i
1  (1  p1i ) ti  1  (1  t i p1i  o( p12i ))  t i p1i
Sampling and Exposure
in Logistic Regression
Loglikelihood   si ni ln( t i )
i
 si ni ln( pi )  t i si ( 1  ni )ln( 1  pi )
i
i
e
In Logistic Regression pi =
1  e i
Loglikelihood   w i ni ln( pi )  w i ( 1  ni )ln( 1  pi )
i
Set wi = si if ni = 1
Set wi = tisi if ni = 0
Overall Model Diagnostics
Results are preliminary
Sort in order of increasing prediction
– Frequency & Severity
Group observations in buckets
– 1/100th of record count for frequency
– 1/50th of the record count for severity
Calculate bucket averages
Apply the GLM link function for bucket averages and
predicted value
– logit for frequency
– log for severity
Plot predicted vs empirical
– With confidence bands
Overall Diagnostics - Frequency
Empirical vs. Predicted Probabilities: BI
(On logistic scales)
-3
 p 
logit  ln 

1

p


empirical.logit
-4
-5
-6
-7
-8
-8
-7
-6
-5
predicted.logit
-4
-3
Overall Diagnostics - Severity
Empirical vs. Predicted Log (Base 10) Severities: BI
4.6
empirical.logsev
4.4
4.2
4.0
3.8
3.6
3.7
3.9
4.1
predicted.logsev
4.3
4.5
Component Diagnostics
Frequency Example
Sort observations in order of Ci
Bucket as above and calculate
– Cib = Average Ci in bucket b
– pib = Average pi in bucket b
– Partial Residuals
 pib  

Rib  ln 
      Ckb 
k i

 1  pib  
Plot Cib vs Rib – Expect linear relationship
Component Diagnostics
Experience and Trend
Logit Partial Residuals vs. Components: Comprehensive
logit.partial.residual
1.0
0.5
0.0
-0.5
-1.0
-0.6
-0.1
0.4
Exp
0.9
Component Diagnostics
Traffic Composition
Logit Partial Residuals vs. Components: Comprehensive
0.4
logit.partial.residual
0.2
0.0
-0.2
-0.4
-0.16
-0.11
-0.06
-0.01
0.04
TrafComp
0.09
0.14
0.19
Component Diagnostics
Traffic Density
Logit Partial Residuals vs. Components: Comprehensive
logit.partial.residual
0.3
0.1
-0.1
-0.3
-0.5
-0.4
-0.3
-0.2
-0.1
0.0
TrafDen
0.1
0.2
0.3
Component Diagnostics
Traffic Generators
Logit Partial Residuals vs. Components: Comprehensive
0.3
logit.partial.residual
0.1
-0.1
-0.3
-0.5
-0.26
-0.21
-0.16
-0.11
-0.06
TrafGen
-0.01
0.04
0.09
Component Diagnostics
Weather
Logit Partial Residuals vs. Components: Comprehensive
0.5
logit.partial.residual
0.3
0.1
-0.1
-0.3
-0.5
-0.4
-0.2
0.0
Weather
0.2
Comparing Model Output to
Current Loss Costs
Model output is deployed to a base class,
standard limits and deductibles.
– Similar to current loss cost, but at garaging
address rather than territory.
Define:
Model Output
Relativity 
Current Loss Cost
Relativity is proportional to premium that
could be charged with “refined loss costs”
using the model output.
Relativities to Current Loss Costs
20 40
0
% Premium
BI Relativity
0.7
0.8
0.9
1
1.1
1.2
1.3
1.1
1.2
1.3
1.1
1.2
1.3
1.1
1.2
1.3
Relativity
50
0 20
% Premium
PD Relativity
0.7
0.8
0.9
1
Relativity
25
0 10
% Premium
Comp Relativity
0.7
0.8
0.9
1
Relativity
0 20 40
% Premium
Collision Relativity
0.7
0.8
0.9
1
Relativity
Newark NJ Area
Combined Relativity
Passaic
Cedar Grove
8"
8"
W est Caldwell
Ridgef ield Park
8"
Little Ferry
Palisades P ark
8"
W allington
8"
8"
Verona
Ridgef ield
Rutherford
8"
8"
Montclair
Nutley
8"
Fairview
8"
Lyndhurs t
8"
Bloomfield
8"
Livingston
Belleville North A rlington
8"
W est Orange
8"
8"
8"
Orange East Orange
8"
Union C ity
8"
8"
South O range
Harris on
8"
Hoboken
8"
8"
Newark
Millburn
8"
8"
Maplewood
Irv ington
8"
Jers ey C ity
8"
8"
Summit
8"
Springf ield
8"
Union
Hills ide
8"
8"
Roselle Park
Cranford
8"
8"
S cotch Plains
Elizabeth
8"
Roselle
8"
Linden
8"
Clark
8"
8"
8"
Kearny
8"
W estfield
Guttenberg
Secaucus
8"
Bayonne
8"
W est New York
Evaluating the Lift of
the Environmental Model
Demonstrate the ability to select the more
profitable risks
Demonstrate the adverse effect of
competitors “skimming the cream”
Calculate the “Value of Lift” statistic
Once insurers see the value of lift other
actions are possible
– Change prices (etc)
Effect of Selecting
Lower Relativities
5
4
3
2
0
75
80
85
90
95
75
80
85
90
95
Selective Underwriting for Comp
Selective Underwriting for Coll
5
4
3
2
1
0
1
2
3
4
5
% Decrease in Loss Ratio
6
% Premium Selected
6
% Premium Selected
0
% Decrease in Loss Ratio
1
% Decrease in Loss Ratio
5
4
3
2
1
0
% Decrease in Loss Ratio
6
Selective Underwriting for PD
6
Selective Underwriting for BI
75
80
85
90
% Premium Selected
95
75
80
85
90
% Premium Selected
95
Effect of Competitors
Selecting Lower Relativities
8
6
4
0
10
20
30
40
50
10
20
30
40
50
Antiselection for Comprehensive
Antiselection for Collision
8
6
4
2
0
2
4
6
8
% Increase in Loss Ratio
10
% Premium Lost to Competition
10
% Premium Lost to Competition
0
% Increase in Loss Ratio
2
% Increase in Loss Ratio
8
6
4
2
0
% Increase in Loss Ratio
10
Antiselection for PD
10
Antiselection for BI
10
20
30
40
50
% Premium Lost to Competition
10
20
30
40
50
% Premium Lost to Competition
Assumptions of The Formula
Value of Lift (VoL)
Assume a competitor comes in and takes away
the business that is less than your class
average.
Because of adverse selection, the new loss ratio
will be higher than the current loss ratio.
What is the value of avoiding this fate?
VoL is proportional to the difference between the
new and the current loss ratio.
Express the VoL as a $ per car year.
The VoL Formula
LC = Current losses
PC = Current Loss Cost
LN = New losses of business remaining
After adverse selection
PN = New Loss Cost
After adverse selection
EC = Current exposure in car years
The VoL Formula
 LN LC 
 P  P   PN
N
C 

VoL 
EC
The numerator represents $ value of the
potential cost of competitors skimming the
cream.
Dividing by EC expresses this value as a $
value per car year.
Value of Lift Results
BI
PD
Comprehensive
Collision
Total
VoL $
5.32
2.84
2.23
2.10
$12.49
VoL % of Loss Cost
3.23%
2.39%
5.26%
1.84%
Customized Model
Loss Cost = Pure Premium
= Frequency x Severity

e
Frequency =
1  e
1 … 5 ≡ 1
in industry model
Severity model
customized similarly
 = 0
+ 1  Weather
+  2  Traffic Density
+  3  Traffic Generators
+  4  Traffic Composition
+  5  Experience and Trend
+ Other Classifiers
Summary
Model estimates loss cost as a function of
business, demographic and weather
conditions.
Demonstrated model diagnostics
Demonstrated lift
Indicated how to customize the model
Download