Loss Reserving: Performance Testing and the Control Cycle Casualty Actuarial Society Pierre Laurin June 17, 2008 © 2007 Towers Perrin Today’s agenda Defining the problem Performance testing — in general and in the context of reserves The reserving actuarial control cycle Case study — real-world results This presentation is based on the paper “Loss Reserving: Performance Testing and the Control Cycle”, authored by Yi Jing, Joseph Lebens, and Stephen Lowe, that has recently been submitted for publication in Variance © 2007 Towers Perrin 2 THE PROBLEM Questions for the reserving actuary How do you know that the methods you are currently using are the “best”? — What evidence supports your selection of methods? — What are the right weights for combining the results of the methods? — How do you decide when to change methods? — What is the confidence range around estimates? — How do you evaluate the cost/benefit of developing new data sources or implementing more complex methods? — How do you instill hubris in junior actuaries? © 2007 Towers Perrin 3 THE PROBLEM The results of our research illustrate the prevalence of actuarial overconfidence Tillinghast Confidence Quiz The Quiz Objective: To test respondents understanding of the limits of their knowledge Raw Scores of Respondents Number of Respondents 0 29 1 Respondents were asked to answer ten questions related to their general knowledge of the global property/casualty industry For each answer, respondents were asked to provide a range that offered a 90% confidence interval that they would answer correctly Ideally (i.e., if “well calibrated”), respondents should have gotten nine out of ten questions correct 67 2 56 3 69 4 44 5 44 28 6 18 7 12 8 7 9 10 0 Note: based on 374 respondents as of 4/5/04. Profile of respondents: 86% work in P/C industry; 73% are actuaries. © 2007 Towers Perrin 4 THE PROBLEM Reserves are forecasts! An actuarial method is used to develop a forecast of future claim payments An actuarial method consists of An algorithm A data set A set of intervention points Lˆ(mt ) m(a, d , p) L(t ) m(a, d , p) m The actuary must 1. Choose a finite set of methods from the universe M {m1 , m2 ,...mn } 2. Choose a set of weights to combine the results of each method together {w1 , w2 ,...wn } Performance testing, via a formal control cycle, can help the actuary make these choices © 2007 Towers Perrin 5 PERFORMANCE TESTING Performance testing is a formal analysis of prediction errors Actual Versus Projected Unpaid Claims -- Accident Year @ 42 Months 35,000 Estimated Unpaid Claims -- PCLD Method 30,000 Actual Unpaid Claims 25,000 20,000 15,000 10,000 5,000 - 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 June 30th Valuation Test a particular method by running the method on historical data – comparing estimates from the method with actual run-off Performance testing is a formalized process, not just a numerical exercise © 2007 Towers Perrin 6 PERFORMANCE TESTING The general framework for performance testing is cross-validation Cross-validation in the context of a regression model X1 = {a1, b1, c1, …} Y1 f1(X) Y*1 X2 = {a2, b2, c2, …} Y2 f2(X) Y*2 X3 = {a3, b3, c3, …} Y3 f3(X) Y*3 : : : Yk fk(X) Y*k : : : Yn fn(X) Y*n : Xk = {ak, bk, ck, …} : Xn = {an, bn, cn, …} © 2007 Towers Perrin Y*k, the prediction of Yk is made first by developing a regression equation fk using all data except (Xk,Yk), then by applying that equation to Xk . This process is repeated iteratively to obtain the complete set of Y*. Comparison of Y to Y* is a fair measure of the skill of the (X,Y) regression model as a means to predict some unobserved future value, Yn+1 from an new observation set Xn+1. 7 PERFORMANCE TESTING Criteria for assessing the performance of a reserving method: BLURS-ICE Least-Square Error The method should produce estimates that have minimum expected error, in the least squares sense. Unbiased Ideally, the method should produce estimates that are unbiased. Responsive The method should respond quickly to changes in underlying trends or operational changes. Given several methods with similar expected errors, we would favor the method that responds to changes in trends or operations most quickly. Stable Methods that work well in some circumstances, but “blow up” in other circumstances are less desirable. Independent The method should not generate prediction errors that are highly correlated with those of other selected methods. (This criteria is discussed more fully in subsequent sections.) Comprehensive The method should use as much data as possible, in an integrated way, so that the maximum available information value is extracted. © 2007 Towers Perrin 8 THE CONTROL CYCLE Performance testing of reserving methods can be part of an institutionalized control cycle The Actuarial Control Cycle for the Reserving Process Formal Performance Testing Are the current methods appropriate? Would changes to methods improve estimation skill? Are the data and other input accurate and sufficient? Would improvements or expansion of data improve estimation skill? Are there opportunities to improve process flow? Are emerging estimation errors within tolerances? © 2007 Towers Perrin 1. Define/Refine Process 3. Measure Performance 2. Implement Process Reserving Process Elements Data used Actuarial methods employed Operational input Judgments and intervention points Process flow and timeline Quality assurance process 9 CASE STUDY Case Study: U.S. Insurer Commercial Auto BI data 1972 to 1998 accident years – June 30th valuations Paid and incurred counts and amounts Estimates of claim liabilities from 1979 to 1998 – twenty years December 31st valuation used as “actual ultimate” Environmental influences during the period add difficulty to estimation Economic and social inflation Operational changes in claim department Changes in underwriting posture © 2007 Towers Perrin 10 CASE STUDY Skill can be measured by comparing actual to predicted unpaid loss ratios Actual Versus Projected Unpaid Claim Ratio -- Accident Year @ 42 Months 30% Estimated Unpaid Claim Ratio -- PCLD Method 25% Actual Unpaid Claim Ratio 20% 15% 10% 5% 0% 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 June 30th Valuation We could just use a paid B-F with a constant ELR and payment pattern every year. This would give us the dotted line as our estimate. We are looking for methods that do better than a crude, constant assumption © 2007 Towers Perrin 11 CASE STUDY Formal measurement of skill The skill of a method is measured by: Skillm 1 msem msa Where mse = mean squared error msa = mean squared anomaly Skill is the proportion of variance “explained” by the method Actual Versus Projected Unpaid Claim Ratio Anomaly -- Accident Year @ 42 Months 12% 10% 8% 6% 4% 2% 0% -2% -4% -6% -8% Predicted Unpaid Claim Ratio Anomaly -- PCLD Method Actual Unpaid Claim Ratio Anomaly 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 June 30th Valuation © 2007 Towers Perrin 12 CASE STUDY Actuarial methods subjected to performance testing Actuarial Projection Method Skill for Accident Year @ 42 Months Overall Skill – for Latest Ten Accident Years Paid Chain-Ladder 23% 13% Incurred Chain-Ladder 52% 32% Case Reserve Development 60% 22% Reported Count Chain-Ladder 99% 99% Case Adequacy Adjusted Incurred Chain-Ladder 52% 52% © 2007 Towers Perrin 13 CASE STUDY Skill can be measured by maturity Commercial Auto BI Liability -- Measured Skill in Estimating Liabilities 100% 90% Reported Count CL Dev. 80% Paid CL Dev. 70% Incurred CL Dev. 60% Case Reserve Dev. Inc. CL with Case Adeq. Adjust. 50% 40% 30% 20% 10% 0% 0 6 12 18 24 30 36 42 48 54 60 66 Maturity Note that skill can be negative (e.g., paid loss projection method at 6 months), implying that it induces volatility rather than explaining it © 2007 Towers Perrin 14 CASE STUDY The minimum-variance weighting of methods depends on their variances and their correlation rho = 1 0.7 rho = .5 Combined Variance 0.6 rho = 0 0.5 rho = -.5 0.4 rho = -1 0.3 0.2 0.1 0 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% For a given correlation, the optimal weights are those with the minimum combined variance Minimum starts at the very right, when correlation is 100% Minimum gradually shifts leftward as correlation decreases © 2007 Towers Perrin 15 CASE STUDY Key conclusions from case study Accuracy, or skill, can be formally measured, using cross-validation approach Performance testing can guide the actuary in choosing methods, and selecting weights between methods; correlation is a relevant criteria in selecting methods and weights Chain-ladder methods are seriously degraded by changing claim policies and procedures; using claim counts to adjust improves skill Experimentation with non-traditional methods is worthwhile; case reserve development has higher skill at later maturities © 2007 Towers Perrin 16 CONCLUSION Good reasons to do performance testing 1. Opportunity to improve accuracy of estimates 2. Formal rationale for selected actuarial methods 3. Empirical validation of stochastic reserve risk models 4. Input to development of reserve ranges 5. Manage actuarial overconfidence 6. Cost / benefit of enhancements to data and systems © 2007 Towers Perrin 17