Dirty Data on Both Sides of the 2008 CAS Ratemaking Seminar

advertisement
Dirty Data on Both Sides of the
Pond: Results of the GIRO
Working Party on Data Quality
2008 CAS Ratemaking Seminar
Boston, Ma.
1
Data Quality Working Party
Members
 Robert Campbell
 Louise Francis (chair)
 Virginia R. Prevosto
 Mark Rothwell
 Simon Sheaf
2
Agenda
 Literature Review
 Horror Stories
 Survey
 Experiment
 Actions
 Conclusions
3
Literature Review
Data quality is maintained and improved by good
data management practices. While the vast majority
of the literature is directed towards the I.T. industry,
the paper highlights the following more actuary- or
insurance-specific information:
Actuarial Standard of Practice #23: Data Quality
Casualty Actuarial Society White Paper on Data Quality
Insurance Data Management Association (IDMA)
Data Management Educational Materials Working Party
4
Actuarial Standard of Practice #23
Provides descriptive standards for:
•
•
•
•
selecting data,
relying on data supplied by others,
reviewing and using data, and
making disclosures about data quality
 http://www.actuarialstandardsboard.org/pdf/asops/asop023_097.pdf
5
Insurance Data Management
Association
The IDMA is an American organization which
promotes professionalism in the Data Management
discipline through education, certification and
discussion forums
The IDMA web site:
•
•
•
Suggests publications on data quality,
Describes a data certification model, and
Contains Data Management Value Propositions
which document the value to various insurance
industry stakeholders of investing in data quality
 http://www.idma.org
6
Cost of Poor Data
 Olson – 15% - 20% of operating profits
 IDMA - costs the U.S. economy $600 billion a
year.
 The IDMA believes that the true cost is higher
than these figures reflect, as they do not
depict “opportunity costs of wasteful use of
corporate assets.” (IDMA Value Proposition –
General Information).
7
CAS Data Management Educational
Materials Working Party
 Reviewed a shortlist of texts recommended by
the IDMA for actuaries (9 in total)
 Publishing a review of each text in the CAS
Actuarial Review (starting with the current
(August) issue)
 Paper published in the Winter 2007 CAS
Forum combines and compares reviews
 “Actuarial IQ (Information Quality)” published
in the Winter 2008 CAS Forum
8
Agenda
 Literature Review
 Horror Stories
 Survey
 Experiment
 Actions
 Conclusions
9
Horror Stories – Non-Insurance
 Heart-and-Lung Transplant – wrong blood
type
 Surgery on wrong side very frequent, but
preventable
 Bombing of Chinese Embassy in Belgrade
 Mars Orbiter – confusion between imperial
and metric units
 Fidelity Mutual Fund – withdrawal of dividend
 Porter County, Illinois – Tax Bill and Budget
Shortfall
10
Horror Stories – Rating/Pricing
 Exposure recorded in units of $10,000
instead of $1,000
 Large insurer reporting personal auto data as
miscellaneous and hence missed from
ratemaking calculations
 One company reporting all its Florida property
losses as fire (including hurricane years)
 Mismatched coding for policy and claims data
11
Horror Stories - Reserving
 NAIC concerns over non-US country data
 Canadian federal regulator uncovered:
 Inaccurate accident year allocation
 Double-counted IBNR
 Claims notified but not properly recorded
12
Horror Stories - Reserving
June 2001, the Independent in liquidation, became the
U.K.’s largest general insurance failure.
 A year earlier, its market valuation had reached £1B.
 Independent’s collapse came after an attempt to
raise £180M in fresh cash by issuing new shares
failed because of revelations that the company faced
unquantifiable losses.
 The insurer had received claims from its customers
that had not been entered into its accounting system,
which contributed to the difficulty in estimating the
company’s liabilities.
13
Horror Stories - Katrina
 US Weather models underestimated costs for
Katrina by approx. 50% (Westfall, 2005)
 2004 RMS study highlighted exposure data that was:



Out-of-date
Incomplete
Mis-coded
 Many flood victims had no flood insurance after being
told by agents that they were not in flood risk areas.
14
Agenda
 Literature Review
 Horror Stories
 Survey
 Experiment
 Actions
 Conclusions
15
Data Quality Survey of Actuaries
 Purpose: Assess the impact of data quality
issues on the work of general insurance
actuaries
 2 questions:


percentage of time spent on data quality
issues
proportion of projects adversely affected by
such issues
16
Results of Survey
Number
of
Responses
Mean
Median
Minimum
Maximum
Insurer/Reinsurer
Consultancy
Other
40
17
17
25.0%
26.9%
29.6%
20.0%
25.0%
25.0%
2.0%
5.0%
1.0%
75.0%
75.0%
80.0%
All
74
26.5%
25.0%
1.0%
80.0%
Number
of
Responses
Mean
Median
Minimum
Maximum
Insurer/Reinsurer
Consultancy
Other
40
17
17
32.5%
37.6%
35.4%
20.0%
30.0%
25.0%
3.5%
5.0%
1.0%
100.0%
100.0%
100.0%
All
74
34.3%
25.0%
1.0%
100.0%
Employer
Employer
17
Survey Conclusions
 Data quality issues have a significant impact
on the work of general insurance actuaries


about a quarter of their time is spent on such
issues
about a third of projects are adversely affected
 The impact varies widely between different
actuaries, even those working in similar
organizations
 Limited evidence to suggest that the impact is
more significant for consultants
18
Agenda
 Literature Review
 Horror Stories
 Survey
 Experiment
 Actions
 Conclusions
19
Hypothesis
Uncertainty of actuarial estimates of ultimate
incurred losses based on poor quality data is
significantly greater than those based on better
quality data
20
Data Quality Experiment
 Examine the impact of incomplete and/or
erroneous data on the actuarial estimate of
ultimate losses and the loss reserves
 Use real data with simulated limitations
and/or errors and observe the potential error
in the actuarial estimates
21
Data Used in Experiment
 Real data for primary private passenger
bodily injury liability business for a single nofault state
 Eighteen (18) accident years of fully
developed data; thus, true ultimate losses are
known
22
Actuarial Methods Used
 Paid chain ladder models


Bornhuetter-Ferguson
Berquist-Sherman Closing rate adjustment
 Incurred chain ladder models
 Inverse power curve for tail factors
 No judgment used in applying methods
23
Completeness of Data Experiments
Vary size of the sample; that is,
1) All years
2) Use only 6 accident years
3) Use only last 3 diagonals
24
Data Error Experiments
Simulated data quality issues:
1. Misclassification of losses by accident year
2. Late processing of financial information
3. Overstatements followed by corrections in
following period
4. Definition of reported claims changed
5. Early years unavailable
25
Measure Impact of Data Quality
•
•
Compare Estimated to Actual Ultimates
Use Bootstrapping to evaluate effect of
different random samples on results
26
Estimated Ultimates based on
Paid Losses
Actual
3 Years of Data
86 - 91
All Year BF
3 Year BF
86 - 91 BF
All Year
140,000
Ultimate Losses
120,000
100,000
80,000
60,000
13
14
40,000
16
15
Year
17
18
27
Est. Ults. based on Adjusted Paid
28
Est. Ults. based on Incurred Losses
29
Results of Adjusting Experience Period
 The adjusted paid and the incurred methods produce
reasonable estimates for all but the most immature points.
However these points contribute the most dollars to the reserve
estimate.
 The paid chain ladder method, which is based on less
information (no case reserves, claim data or exposure
information), produces worse estimates than the methods based
on the incurred data or the adjusted paid data.
 Methods requiring more information, such as BornhuetterFerguson and Berquist-Sherman, performed better
 It is not clear from this analysis that datasets with more historical
years of experience produce better estimates than datasets with
fewer years of experience.
30
Experiment Part 2
Next, we introduced three changes to simulate errors
and test how they affected estimates:
1. Losses from accident years 1983 and 1984 have
been misclassified as 1982 and 1983 respectively.
2. Approximately half of the financial movements from
1987 were processed late in 1988.
3. Incremental paid losses for accident year 1988
development period 12-24 overstated by a multiple
of 10. This was corrected in the following
development period. An outstanding reserve for a
claim in accident year 1985 at the end of
development month 60 was overstated by a multiple
of 100 and was corrected in the following period.
31
Comparison of Actual and Estimated
Ultimate Losses
Based on Error-Modified Incurred Loss Data
32
Standard Errors for Adjusted Paid Loss
Data
Modified to Incorporate Errors
33
Results of Introducing Errors
 The error-modified data reflecting all changes
results in estimates having higher standard
errors than those based on the clean data
 For incurred ultimate losses, the clean data
has the lowest bias and lowest standard error
 Error-modified data produced:


More bias in estimates
More volatile estimates
34
Distribution of Reserve Errors
Bootstrap Difference From True Reserves - Paid All Years
250,000
200,000
150,000
Change 1
Change 2
Change 3
All Changes
Clean Data
100,000
50,000
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
99
95
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5%
1%
-
(50,000)
(100,000)
Percentile
35
Results of Bootstrapping
 Less dispersion in results for error free data
 Standard deviation of estimated ultimate
losses greater for the modified data (data with
errors)
 Confirms original hypothesis:
Errors increase the uncertainty of estimates
36
Conclusions Resulting from
Experiment
 Generally greater accuracy and less variability in
actuarial estimates when:


Quality data used
Greater number of accident years used
 Data quality issues can erode or even reverse the
gains of increased volumes of data:

If errors are significant, more data may worsen
estimates due to the propagation of errors for certain
projection methods
 Significant uncertainty in results when:
 Data is incomplete
 Data has errors
37
Agenda
 Literature Review
 Horror Stories
 Survey
 Experiment
 Actions
 Conclusions
38
Actions – What can we do?
 Data Quality Advocacy
 Data Screening or Exploratory Data Analysis
(EDA)
39
Data Quality Advocacy
 Data quality – measurement
 Data quality – management issues
40
DQ - Measurement
 Redman – advocates a sampling approach to
measurement & depends on how advanced
data quality at company currently is
 Other – automated validation/accuracy
techniques
41
DQ - Management Issues
 Manage Information Chain
(from Data Quality, the Field Guide by Redman)







establish management responsibilities
describe information chart
understand customer needs
establish measurement system
establish control and check performance
identify improvement opportunities
make improvements
42
Data Screening
 Visual



Histograms
Box and Whisker Plots
Stem and Leaf Plots
 Statistical


Descriptive statistics
Multivariate screening
 Software – Excel, R, SAS, etc.
43
Example Data
 Texas Workers Compensation Closed Claims
 Some variables are:
 Incurred Losses
 Paid Losses
 Attorney Involvement
 Injury Type
 Cause Type
 County
44
Box Plot
45
Box and Whisker Plot
46
Histogram/Frequency Polygon
47
Categorical Data – Bar Plots
48
Descriptive Statistics
 Year of Date Licensed has minimum and
maximums that are impossible
N
License Year
30,250
Valid N
30,250
Minimum
490
Maximum
2,049
Mean
1,990
Std.
Deviation
16.3
49
Agenda
 Literature Review
 Horror Stories
 Survey
 Experiment
 Actions
 Conclusions
50
Conclusions
 Anecdotal horror stories illustrate possible dramatic




impact of data quality problems
Data quality survey suggests data quality issues have
a significant cost on actuarial work
Data quality experiment shows data quality issues
have significant effect on accuracy of results
Data Quality Working Party urges actuaries to
become data quality advocates within their
organizations
Techniques from Exploratory Data Analysis can be
used to detect data anomalies
51
Dirty Data on Both Sides of the
Pond: Results of the GIRO
Working Party on Data Quality
Questions?
52
Download