Dirty Data on Both Sides of the 2008 CAS Ratemaking Seminar

Dirty Data on Both Sides of the Pond: Results of the GIRO Working Party on Data Quality 2008 CAS Ratemaking Seminar Boston, Ma. 1 Data Quality Working Party Members  Robert Campbell  Louise Francis (chair)  Virginia R. Prevosto  Mark Rothwell  Simon Sheaf 2 Agenda  Literature Review  Horror Stories  Survey  Experiment  Actions  Conclusions 3 Literature Review Data quality is maintained and improved by good data management practices. While the vast majority of the literature is directed towards the I.T. industry, the paper highlights the following more actuary- or insurance-specific information: Actuarial Standard of Practice #23: Data Quality Casualty Actuarial Society White Paper on Data Quality Insurance Data Management Association (IDMA) Data Management Educational Materials Working Party 4 Actuarial Standard of Practice #23 Provides descriptive standards for: • • • • selecting data, relying on data supplied by others, reviewing and using data, and making disclosures about data quality  http://www.actuarialstandardsboard.org/pdf/asops/asop023_097.pdf 5 Insurance Data Management Association The IDMA is an American organization which promotes professionalism in the Data Management discipline through education, certification and discussion forums The IDMA web site: • • • Suggests publications on data quality, Describes a data certification model, and Contains Data Management Value Propositions which document the value to various insurance industry stakeholders of investing in data quality  http://www.idma.org 6 Cost of Poor Data  Olson – 15% - 20% of operating profits  IDMA - costs the U.S. economy $600 billion a year.  The IDMA believes that the true cost is higher than these figures reflect, as they do not depict “opportunity costs of wasteful use of corporate assets.” (IDMA Value Proposition – General Information). 7 CAS Data Management Educational Materials Working Party  Reviewed a shortlist of texts recommended by the IDMA for actuaries (9 in total)  Publishing a review of each text in the CAS Actuarial Review (starting with the current (August) issue)  Paper published in the Winter 2007 CAS Forum combines and compares reviews  “Actuarial IQ (Information Quality)” published in the Winter 2008 CAS Forum 8 Agenda  Literature Review  Horror Stories  Survey  Experiment  Actions  Conclusions 9 Horror Stories – Non-Insurance  Heart-and-Lung Transplant – wrong blood type  Surgery on wrong side very frequent, but preventable  Bombing of Chinese Embassy in Belgrade  Mars Orbiter – confusion between imperial and metric units  Fidelity Mutual Fund – withdrawal of dividend  Porter County, Illinois – Tax Bill and Budget Shortfall 10 Horror Stories – Rating/Pricing  Exposure recorded in units of $10,000 instead of $1,000  Large insurer reporting personal auto data as miscellaneous and hence missed from ratemaking calculations  One company reporting all its Florida property losses as fire (including hurricane years)  Mismatched coding for policy and claims data 11 Horror Stories - Reserving  NAIC concerns over non-US country data  Canadian federal regulator uncovered:  Inaccurate accident year allocation  Double-counted IBNR  Claims notified but not properly recorded 12 Horror Stories - Reserving June 2001, the Independent in liquidation, became the U.K.’s largest general insurance failure.  A year earlier, its market valuation had reached £1B.  Independent’s collapse came after an attempt to raise £180M in fresh cash by issuing new shares failed because of revelations that the company faced unquantifiable losses.  The insurer had received claims from its customers that had not been entered into its accounting system, which contributed to the difficulty in estimating the company’s liabilities. 13 Horror Stories - Katrina  US Weather models underestimated costs for Katrina by approx. 50% (Westfall, 2005)  2004 RMS study highlighted exposure data that was:    Out-of-date Incomplete Mis-coded  Many flood victims had no flood insurance after being told by agents that they were not in flood risk areas. 14 Agenda  Literature Review  Horror Stories  Survey  Experiment  Actions  Conclusions 15 Data Quality Survey of Actuaries  Purpose: Assess the impact of data quality issues on the work of general insurance actuaries  2 questions:   percentage of time spent on data quality issues proportion of projects adversely affected by such issues 16 Results of Survey Number of Responses Mean Median Minimum Maximum Insurer/Reinsurer Consultancy Other 40 17 17 25.0% 26.9% 29.6% 20.0% 25.0% 25.0% 2.0% 5.0% 1.0% 75.0% 75.0% 80.0% All 74 26.5% 25.0% 1.0% 80.0% Number of Responses Mean Median Minimum Maximum Insurer/Reinsurer Consultancy Other 40 17 17 32.5% 37.6% 35.4% 20.0% 30.0% 25.0% 3.5% 5.0% 1.0% 100.0% 100.0% 100.0% All 74 34.3% 25.0% 1.0% 100.0% Employer Employer 17 Survey Conclusions  Data quality issues have a significant impact on the work of general insurance actuaries   about a quarter of their time is spent on such issues about a third of projects are adversely affected  The impact varies widely between different actuaries, even those working in similar organizations  Limited evidence to suggest that the impact is more significant for consultants 18 Agenda  Literature Review  Horror Stories  Survey  Experiment  Actions  Conclusions 19 Hypothesis Uncertainty of actuarial estimates of ultimate incurred losses based on poor quality data is significantly greater than those based on better quality data 20 Data Quality Experiment  Examine the impact of incomplete and/or erroneous data on the actuarial estimate of ultimate losses and the loss reserves  Use real data with simulated limitations and/or errors and observe the potential error in the actuarial estimates 21 Data Used in Experiment  Real data for primary private passenger bodily injury liability business for a single nofault state  Eighteen (18) accident years of fully developed data; thus, true ultimate losses are known 22 Actuarial Methods Used  Paid chain ladder models   Bornhuetter-Ferguson Berquist-Sherman Closing rate adjustment  Incurred chain ladder models  Inverse power curve for tail factors  No judgment used in applying methods 23 Completeness of Data Experiments Vary size of the sample; that is, 1) All years 2) Use only 6 accident years 3) Use only last 3 diagonals 24 Data Error Experiments Simulated data quality issues: 1. Misclassification of losses by accident year 2. Late processing of financial information 3. Overstatements followed by corrections in following period 4. Definition of reported claims changed 5. Early years unavailable 25 Measure Impact of Data Quality • • Compare Estimated to Actual Ultimates Use Bootstrapping to evaluate effect of different random samples on results 26 Estimated Ultimates based on Paid Losses Actual 3 Years of Data 86 - 91 All Year BF 3 Year BF 86 - 91 BF All Year 140,000 Ultimate Losses 120,000 100,000 80,000 60,000 13 14 40,000 16 15 Year 17 18 27 Est. Ults. based on Adjusted Paid 28 Est. Ults. based on Incurred Losses 29 Results of Adjusting Experience Period  The adjusted paid and the incurred methods produce reasonable estimates for all but the most immature points. However these points contribute the most dollars to the reserve estimate.  The paid chain ladder method, which is based on less information (no case reserves, claim data or exposure information), produces worse estimates than the methods based on the incurred data or the adjusted paid data.  Methods requiring more information, such as BornhuetterFerguson and Berquist-Sherman, performed better  It is not clear from this analysis that datasets with more historical years of experience produce better estimates than datasets with fewer years of experience. 30 Experiment Part 2 Next, we introduced three changes to simulate errors and test how they affected estimates: 1. Losses from accident years 1983 and 1984 have been misclassified as 1982 and 1983 respectively. 2. Approximately half of the financial movements from 1987 were processed late in 1988. 3. Incremental paid losses for accident year 1988 development period 12-24 overstated by a multiple of 10. This was corrected in the following development period. An outstanding reserve for a claim in accident year 1985 at the end of development month 60 was overstated by a multiple of 100 and was corrected in the following period. 31 Comparison of Actual and Estimated Ultimate Losses Based on Error-Modified Incurred Loss Data 32 Standard Errors for Adjusted Paid Loss Data Modified to Incorporate Errors 33 Results of Introducing Errors  The error-modified data reflecting all changes results in estimates having higher standard errors than those based on the clean data  For incurred ultimate losses, the clean data has the lowest bias and lowest standard error  Error-modified data produced:   More bias in estimates More volatile estimates 34 Distribution of Reserve Errors Bootstrap Difference From True Reserves - Paid All Years 250,000 200,000 150,000 Change 1 Change 2 Change 3 All Changes Clean Data 100,000 50,000 % % % % % % % % % % % % % % % % % % % 99 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 5% 1% - (50,000) (100,000) Percentile 35 Results of Bootstrapping  Less dispersion in results for error free data  Standard deviation of estimated ultimate losses greater for the modified data (data with errors)  Confirms original hypothesis: Errors increase the uncertainty of estimates 36 Conclusions Resulting from Experiment  Generally greater accuracy and less variability in actuarial estimates when:   Quality data used Greater number of accident years used  Data quality issues can erode or even reverse the gains of increased volumes of data:  If errors are significant, more data may worsen estimates due to the propagation of errors for certain projection methods  Significant uncertainty in results when:  Data is incomplete  Data has errors 37 Agenda  Literature Review  Horror Stories  Survey  Experiment  Actions  Conclusions 38 Actions – What can we do?  Data Quality Advocacy  Data Screening or Exploratory Data Analysis (EDA) 39 Data Quality Advocacy  Data quality – measurement  Data quality – management issues 40 DQ - Measurement  Redman – advocates a sampling approach to measurement & depends on how advanced data quality at company currently is  Other – automated validation/accuracy techniques 41 DQ - Management Issues  Manage Information Chain (from Data Quality, the Field Guide by Redman)        establish management responsibilities describe information chart understand customer needs establish measurement system establish control and check performance identify improvement opportunities make improvements 42 Data Screening  Visual    Histograms Box and Whisker Plots Stem and Leaf Plots  Statistical   Descriptive statistics Multivariate screening  Software – Excel, R, SAS, etc. 43 Example Data  Texas Workers Compensation Closed Claims  Some variables are:  Incurred Losses  Paid Losses  Attorney Involvement  Injury Type  Cause Type  County 44 Box Plot 45 Box and Whisker Plot 46 Histogram/Frequency Polygon 47 Categorical Data – Bar Plots 48 Descriptive Statistics  Year of Date Licensed has minimum and maximums that are impossible N License Year 30,250 Valid N 30,250 Minimum 490 Maximum 2,049 Mean 1,990 Std. Deviation 16.3 49 Agenda  Literature Review  Horror Stories  Survey  Experiment  Actions  Conclusions 50 Conclusions  Anecdotal horror stories illustrate possible dramatic     impact of data quality problems Data quality survey suggests data quality issues have a significant cost on actuarial work Data quality experiment shows data quality issues have significant effect on accuracy of results Data Quality Working Party urges actuaries to become data quality advocates within their organizations Techniques from Exploratory Data Analysis can be used to detect data anomalies 51 Dirty Data on Both Sides of the Pond: Results of the GIRO Working Party on Data Quality Questions? 52

Dirty Data on Both Sides of the 2008 CAS Ratemaking Seminar

Related documents

Products

Support

Dirty Data on Both Sides of the 2008 CAS Ratemaking Seminar

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib