Modeling the Loss Process for Medical Malpractice Bill Faltas GE Insurance Solutions CAS Special Interest Seminar … Predictive Modeling “GLM and the Medical Malpractice Crisis” Session October 4, 2004 Chicago, IL © Employers Reinsurance Corporation - 2004 Patent Pending "The work of science is to substitute facts for appearances and demonstrations for impressions.“ John Ruskin 2 © Employers Reinsurance Corporation - 2004 Patent Pending Regression Modeling Simply: A functional relationship between one unknown (Y) and one or more knowns (X’s) Y = f (X1, X2, ... , Xn) error Statistically: A distribution for Y with parameters that vary with X Example: Ordinary Least Squares (OLS) (“linear regression”) • Y ~ Normal (i,2) • Estimate both i and 2 have a measure of variability (2) • E(Y) is a linear combination of X’s i = a + b1X1i + b2X2i +…+ bnXni • Estimate parameters a, b1, b2, … , bn 3 © Employers Reinsurance Corporation - 2004 Patent Pending Terminology X’s (explanatory / covariate / predictor / independent variables) could be: (1) Numerical: (a) Continuous [e.g., years of practice, square feet] (b) Discrete [e.g., # past claims] (2) Categorical: (a) Ordinal [e.g., income or state group (H / M / L)] (b) Nominal [e.g., gender (M/F), state] Y (response / dependent variable) could be: (1) Continuous [e.g., total $ losses from an insurance policy] (2) Discrete [e.g., # of insurance claims] (3) Binary [e.g., whether an insurance policy is likely to have a claim (Y/N)] 4 © Employers Reinsurance Corporation - 2004 Patent Pending Popular Regression Modeling Choices Y Continuous Y Binary (0,1) Ordinary Least Squares Model (OLS) Logistic Model Y 5 Y Positive (Y>0) Y Discrete {0,1,2,3, …} Exponential Model Poisson Model © Employers Reinsurance Corporation - 2004 Patent Pending GLM, OLS, and Logistic Model GLM OLS Logistic Form of Y Any Continuous Binary (0,1) Distribution of Y Y ~ Exponential Family Y ~ Normal (,2) Y (=1/0) ~ Bernoulli (P) (in exponential family) (in exponential family) Mean(Y) = h(X • ) =X• Mean(Yi) = f(a + b1X1i+ … + bnXni) i = a + b1X1i + …+ bnXni Pi = P(Yi=1) = eLi / (1+ eLi) f(linear combination of X’s) (linear combination of X’s) where Li =a + b1X1i + … + bnXni Item Model [E(Y)] Method of Estimating a, b1, … , bn 6 M.L.E. Method of Least Squares P=e X • / (1 + e X• ) M.L.E. (same as M.L.E. for Normal) © Employers Reinsurance Corporation - 2004 Patent Pending Loss Process Model for Medical Malpractice Line Characteristic: low frequency / high severity Objective: Build models to forecast emergence and ultimate values for (Y’s) • # notices (a.k.a. incidents) • # notices that turn into claims with indemnity payment • $ losses Based on Four Types of X’s • Policyholder attributes … state, specialty, years of practice, etc. • Policy attributes … form type, limit, etc. • Environmental attributes … lawyers per 1000, births per 1000, etc. • Time … e.g., policy age measures time since effective date 7 © Employers Reinsurance Corporation - 2004 Patent Pending Likelihood of Notice Dependence of Likelihood on X1 Dependence of Likelihood on X2 Mi d p o i n t pdf1 0.10 of T i me I nt er v al =3 p dpdf1 f 1 0 . 0.032 032 0 . 0.031 031 0 . 0.030 030 Likelihood at policy age 2.5 years (mode) increases with X1 0.09 0.08 0 . 0.029 029 0 . 0.028 028 0 . 0.027 027 0 . 0.026 026 0 . 0.025 025 0 . 0.024 024 0.07 0 . 0.023 023 Claim Likelihood Likelihood at policy for rises agedoctor 2.5 years and falls with (mode), rises and Age falls with X2 0 . 0.022 022 0 . 0.021 021 0.06 0 . 0.020 020 0 . 0.019 019 Not significantly different 0.05 0 . 0.018 018 0 . 0.017 017 0 . 0.016 016 0.04 0 . 0.015 015 0 . 0.014 014 0 . 0.013 013 0.03 0 . 0.012 012 0 . 0.011 011 Not significantly different 0.02 0 . 0.010 010 0 . 0.009 009 0 . 0.008 008 0.01 0 . 0.007 007 1 2 3 4 5 6 7 8 20-30 22 30-40 33 40-50 44 X1 50-60 55 60-70 66 X2 age_new • Likelihood is a function of many (X) variables, including policy age • Likelihood changes with X1 and X2 include both in model • Y is binary (1/0), “whether there is a notice or not” 8 © Employers Reinsurance Corporation - 2004 Patent Pending 70-807 7 Likelihood of Notice A Logistic Model (a GLM application) To model: P = Likelihood of Notice = Pr(Y=1) P = P(Y=1) = eL / (1+ eL) where L =a + b1X1 + … + bnXn • Transform some of the X variables, including policy age • Develop model based on 70% data • Validate model on remaining 30% of data • Compare actual vs. modeled triangles of ‘# policies with notices’ • Finalize parameters on 100% of data 9 © Employers Reinsurance Corporation - 2004 Patent Pending Model Validation Approaches Sampling Resampling Partitioning • Set aside sample • Set aside 1st sample • Develop parameters using remaining data • Develop parameters using remaining data • Divide data into n partitions (often 4-6) • Verify model works against sample • Verify model works against 1st sample • Develop parameters using other partitions • Finalize model using all data • Resample and redo … n times • Verify model works against 1st partition • Finalize model using all data • Repeat process for all other partitions • Set aside 1st partition • Finalize model using all data Uses all data 10 © Employers Reinsurance Corporation - 2004 Patent Pending Notice to Claim … Waiting Time Approach Empirical PDF of “Waiting Time” for 6 categories of X2 0.100 • Waiting time defined as time from notice to claim “Waiting time” varies by different values of attribute X2 include X2 in notice-to-claim model 0.075 • Waiting time approach enables lack of claim data to be used as information Area represents probability of turning into a claim 1.0 - 2.5 years after receiving notice (no actual data prior to 1.0 year). 0.050 0.025 0.000 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 • # Claims = (# notices) x (prob. of notice turning into a claim) Waiting Time 11 © Employers Reinsurance Corporation - 2004 Patent Pending Estimate Claim Closing Values (Claim Sizes) • Model trended claim sizes using standard actuarial approaches – Closed claims, without regard to closing lag – Closed claims by closing lag – Closed claims by policyholder attributes • Compare company data and models with external benchmarks • Select model(s) • Test modeled severities against actual severities – Actual severities in development triangles – Modeled severities: f(policyholder, policy, closing year) 12 © Employers Reinsurance Corporation - 2004 Patent Pending Claim Size Distribution Density P.D.F of Log of claim sizes by 8 groups of X1 Claim size distribution varies by different values of attribute X1 include X1 in claim size modeling modes • Claim size model parameters are a function of significant attributes • Model location and shape varies w/attributes • A way to introduce distributional variation LN(Claim Size) 13 © Employers Reinsurance Corporation - 2004 Patent Pending Modeling Summary Policyholder Attributes Policy Attributes Environmental Attributes Notices Becoming Claims (Waiting Time) # Claims = # Notices x Prob of Notice to Claim Claim Size Distribution CLAIM SIZES CLAIM COUNTS # Notices (Logistic Model) $ Losses = # Claims x Claim Size $ LOSSES 14 © Employers Reinsurance Corporation - 2004 Patent Pending GLM Application Advantages Useful for all lines, including low freq / high sev Identifies and uses significant variables simultaneously Effective in dealing with interacting variables Can use time element to model emergence and ultimates Variability of modeled estimates can be byproduct and useful for measurements of risk/uncertainty Multiple applications Underwriting Pricing Reserving Risk 15 © Employers Reinsurance Corporation - 2004 Patent Pending