If They Cheat, Can We Catch Them With Predictive Modeling Richard A. Derrig, PhD, CFE President Opal Consulting, LLC Senior Consultant Insurance Fraud Bureau of Massachusetts CAS Predictive Modeling October 11-12, 2007 Insurance Fraud- The Problem ISO/IRC 2001 Study: Auto and Workers Compensation Fraud a Big Problem by 27% of Insurers. CAIF: Estimation (too large) Mass IFB: 1,500 referrals annually for Auto, WC, and Other P-L lines. Fraud Definition PRINCIPLES Clear and willful act Proscribed by law Obtaining money or value Under false pretenses Abuse, Unethical:Fails one or more Principles HOW MUCH CLAIM FRAUD? (CRIMINAL or CIVIL?) 10% Fraud REAL PROBLEM-CLAIM FRAUD Classify all claims Identify valid classes Pay the claim No hassle Visa Example Identify (possible) fraud Investigation needed Identify “gray” classes Minimize with “learning” algorithms Company Automation - Data Mining Data Mining/Predictive Modeling Automates Record Reviews No Data Mining without Good Clean Data (90% of the solution) Insurance Policy and Claim Data; Business and Demographic Data Data Warehouse/Data Mart Data Manipulation – Simple First; Complex Algorithms When Needed DATA Computers advance FRAUD IDENTIFICATION Experience and Judgment Artificial Intelligence Systems Regression & Tree Models Fuzzy Clusters Neural Networks Expert Systems Genetic Algorithms All of the Above MATHEMATICAL MODELS Databases Vector Spaces Topological Spaces Stochastic Processes Scoring Mappings to R Linear Functionals DM Databases Scoring Functions Graded Output Non-Suspicious Claims Routine Claims Suspicious Claims Complicated Claims DM Databases Scoring Functions Graded Output Non-Suspicious Risks Routine Underwriting Suspicious Risks Non-Routine Underwriting POTENTIAL VALUE OF A PREDICTIVE MODELING SCORING SYSTEM Screening to Detect Fraud Early Auditing of Closed Claims to Measure Fraud, Both Kinds Sorting to Select Efficiently among Special Investigative Unit Referrals Providing Evidence to Support a Denial Protecting against Bad-Faith PREDICTIVE MODELING SOME PUBLIC TECHNIQUES Fuzzy Logic and Controllers Regression Scoring Systems Unsupervised Techniques: Kohonen and PRIDIT EM Algorithm (Medical Bills) Tree-based Methods FUZZY SETS COMPARED WITH PROBABILITY Probability: Measures randomness; Measures whether or not event occurs; and Randomness dissipates over time or with further knowledge. Fuzziness: Measures vagueness in language; Measures extent to which event occurs; and Vagueness does not dissipate with time or further knowledge. Fuzzy Clustering & Detection: k-Means Clustering Fuzzy Logic: True, False, Uncertain Fuzzy Numbers: Membership Value Fuzzy Clusters: Partial Membership App1: Suspicion of Fraud App2: Town (Zip) Rating Classes REF: Derrig-Ostazewski 1995 FUZZY SETS TOWN RATING CLASSIFICATION When is one Town near another for Auto Insurance Rating? - Geographic Proximity (Traditional) - Overall Cost Index (Massachusetts) - Geo-Smoothing (Experimental) Geographically close Towns do not have the same Expected Losses. Clusters by Cost Produce Border Problems: Nearby Towns Different Rating Territories. Fuzzy Clusters acknowledge the Borders. Are all coverage clusters correct for each Insurance Coverage? Fuzzy Clustering on Five Auto Coverage Indices is better and demonstrates a weakness in Overall Crisp Clustering. Fuzzy Clustering of Fraud Study Claims by Assessment Data Membership Value Cut at 0.2 Suspicion Centers 6 FINAL CLUSTERS (A, C, Is, Ij, T, LN) 5 (7,8,7,8,8,0) 4 (1,7,0,7,7,0) 3 (1,4,0,4,6,0) 2 (0,1,0,1,3,0) Build-up (Inj. Sus. Level = to or >5) 1 Build-up (Inj. Sus. Level <5) Valid (0,0,0,0,0,0) Opportunistic Fraud Planned Fraud 0 0 10 20 30 40 50 60 70 80 90 100 CLAIM FEATURE VECTOR ID = Full Member =Partial Member Alpha = .2 110 120 130 AIB FRAUD INDICATORS Accident Characteristics (19) No report by police officer at scene No witnesses to accident Claimant Characteristics (11) Retained an attorney very quickly Had a history of previous claims Insured Driver Characteristics(8) Had a history of previous claims Gave address as hotel or P.O. Box Supervised Models Regression: Fraud Indicators Fraud Indicators Serve as Independent Dummy Variables Expert Evaluation Categories Serve as Dependent Target Regression Scoring Systems REF1: Weisberg-Derrig, 1998 REF2: Viaene et al., 2002 Unsupervised Models Kohonen Self-Organizing Features Fraud Indicators Serve as Independent “Features” Expert Evaluation Categories Can Serve as Dependent Target in Second Phase Self-Organizing Feature Maps T. Kohonen 1982-1990 (Cybernetics) Reference vectors map to OUTPUT format in topologically faithful way. Example: Map onto 40x40 2-dimensional square. Iterative Process Adjusts All Reference Vectors in a “Neighborhood” of the Nearest One. Neighborhood Size Shrinks over Iterations Patterns MAPPING: PATTERNS-TO-UNITS KOHONEN FEATURE MAP SUSPICION LEVELS S16 S13 4-5 S10 3-4 S7 16 13 10 7 4 1 S4 S1 2-3 1-2 0-1 FEATURE MAP SIMILIARITY OF A CLAIM S16 S13 4-5 S10 3-4 S7 17 13 9 5 1 S4 S1 2-3 1-2 0-1 DATA MODELING EXAMPLE: CLUSTERING Data on 16,000 Medicaid providers analyzed by unsupervised neural net Neural network clustered Medicaid providers based on 100+ features Investigators validated a small set of known fraudulent providers Visualization tool displays clustering, showing known fraud and abuse Subset of 100 providers with similar patterns investigated: Hit rate > 70% © 1999 Intelligent Technologies Corporation Cube size proportional to annual Medicaid revenues SELF ORGANIZING MAP Binary Features d(m,0) = Suspicion Level Given c = {features} c →mc “Guilt by Association” PRIDIT Unique Embedded Score Data: Features have no natural metric-scale Model: Stochastic process has no parametric form Classification: Inverse image of one dimensional scoring function and decision rule Feature Value: Identify which features are “important” PRIDIT METHOD OVERVIEW 1. DATA: N Claims, T Features, K sub T Responses, Monotone In “Fraud” 2. RIDIT score each possible response: proportion below minus proportion above, score centered at zero. 3. RESPONSE WEIGHTS: Principal Component of Claims x Features with RIDIT in Cells (SPSS, SAS or S Plus software) 4. SCORE: Sum weights x claim Ridit score. 5. PARTITION: above and below zero. PRIDIT METHOD RESULTS 1. DATA: N Claims Clustered in Varying Degrees of “Fraud” 2. FEATURES: Each Feature Has Importance Weight 3. CLUSTERS: Size Can Change With Experience 4. SCORE: Can Be Checked On New Database 5. DECISIONS: Scores Near Zero = Too Little Information. TABLE 2 Weights for Treatment Variables PRIDIT Weights W(∞) Regression Weights TRT1 .30 .32*** TRT2 .19 .19*** TRT3 .53 .22*** TRT4 TRT5 .38 .02 .07 .08* TRT6 TRT7 TRT8 .70 .82 .37 -.01 .03 .18*** Variable TRT9 -.13 .24** Regression significance shown at 1% (***), 5% (**) or 10% (*) levels. TABLE 3 PRIDIT Transformed Indicators, Scores and Classes Claim TRT 1 TRT 2 TRT 3 TRT 4 TRT 5 TRT 6 TRT 7 TRT 8 TRT 9 Score Class 1 0.44 0.12 0.08 0.2 0.31 0.09 0.24 0.11 0.04 .07 2 2 0.44 0.12 0.08 0.2 -0.69 0.09 0.24 0.11 0.04 .07 2 3 0.44 -0.88 -0.92 0.2 0.31 -0.91 -0.76 0.11 0.04 -.25 1 4 -0.56 0.12 0.08 0.2 0.31 0.09 0.24 0.11 0.04 .04 2 5 -0.56 -0.88 0.08 0.2 0.31 0.09 0.24 0.11 0.04 .02 2 6 0.44 0.12 0.08 0.2 0.31 0.09 0.24 0.11 0.04 .07 2 7 -0.56 0.12 0.08 0.2 0.31 0.09 -0.76 -0.89 0.04 -.10 1 8 -0.44 0.12 0.08 0.2 -0.69 0.09 0.24 0.11 0.04 .02 2 9 -0.56 -0.88 0.08 -0.8 0.31 0.09 0.24 0.11 -0.96 .05 2 10 -0.56 0.12 0.08 0.2 0.31 0.09 0.24 0.11 0.04 .04 2 TABLE 7 AIB Fraud and Suspicion Score Data Top 10 Fraud Indicators by Weight PRIDIT Adj. Reg. Score Inv. Reg. Score ACC3 ACC1 ACC11 ACC4 ACC9 CLT4 ACC15 ACC10 CLT7 CLT11 ACC19 CLT11 INJ1 CLT11 INJ1 INJ2 INS6 INJ3 INJ5 INJ2 INJ8 INJ6 INJ9 INJ11 INS8 TRT1 TRT1 TRT1 LW6 TRT9 EM Algorithm Hidden Exposures - Overview Modeling hidden risk exposures as additional dimension(s) of the loss severity distribution via EM, Expectation-Maximization, Algorithm Considering the mixtures of probability distributions as the model for losses affected by hidden exposures with some parameters of the mixtures considered missing (i.e., unobservable in practice) Approach is feasible due to advancements in the computer driven methodologies dealing with partially hidden or incomplete data models Empirical data imputation has become more sophisticated and the availability of ever faster computing power have made it increasingly possible to solve these problems via iterative algorithms Figure 1: Overall distribution of the 348 BI medical bill amounts from Appendix B compared with that submitted by provider A. Left panel: frequency histograms (provider A’s histogram in filled bars). Source: Modeling Hidden Exposures in Claim Severity via the EM Algorithm, Grzegorz A. Rempala, Richard A. Derrig, pg. 9, 11/18/02 Right panel: density estimators (provider A’s density in dashed line) Figure 2: EM Fit Left panel: mixture of normal distributions fitted via the EM algorithm to BI data Source: Modeling Hidden Exposures in Claim Severity via the EM Algorithm, Grzegorz A. Rempala, Richard A. Derrig, pg. 13, 11/18/02 Right panel: Three normal components of the mixture. Decision Trees In decision theory (for example risk management), a decision tree is a graph of decisions and their possible consequences, (including resource costs and risks) used to create a plan to reach a goal. Decision trees are constructed in order to help with making decisions. A decision tree is a special form of tree structure. www.wikipedia.org Different Kinds of Decision Trees Single Trees (CART, CHAID) Ensemble Trees, a more recent development (TREENET, RANDOM FOREST) A composite or weighted average of many trees (perhaps 100 or more) There are many methods to fit the trees and prevent overfitting Boosting: Iminer Ensemble and Treenet Bagging: Random Forest The Methods and Software Evaluated 1) 2) 3) 4) TREENET Iminer Tree SPLUS Tree CART 5) 6) 7) 8) Iminer Ensemble Random Forest Naïve Bayes (Baseline) Logistic (Baseline) Ensemble Prediction of IME Requested 0.90 0.80 Value Prob IME 0.70 0.60 0.50 0.40 0.30 11368 2540 1805 1450 1195 989 821 683 560 450 363 275 200 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Provider 2 Bill TREENET ROC Curve – IME AUROC = 0.701 Ranking of Methods/Software – 1st Two Surrogates Ranking of Methods By AUROC - Decision Method SIU AUROC SIU Rank IME Rank IME AUROC Random Forest 0.645 1 1 0.703 TREENET 0.643 2 2 0.701 S-PLUS Tree 0.616 3 3 0.688 Iminer Naïve Bayes 0.615 4 5 0.676 Logistic 0.612 5 4 0.677 CART Tree 0.607 6 6 0.669 Iminer Tree 0.565 7 8 0.629 Iminer Ensemble 0.539 8 7 0.649 Implementation Outline Included at End NON-CRIMINAL FRAUD? Non-Criminal Fraud General Deterrence – Ins. System, Ins Dept + DIA, Med, and Other Government Oversight Specific Deterrence – Company SIU, Auditor, Data, Predictive Modeling for Claims and Underwriting. FRAUD INDICATORS VALIDATION PROCEDURES Canadian Coalition Against Insurance Fraud (1997) 305 Fraud Indicators (45 vehicle theft) “No one indicator by itself is necessarily suspicious”. Problem: How to validate the systematic use of Fraud Indicators? Solution 1: Kohonen Self-Organizing Feature Map (Brockett et.al, 1998 Solution 2: Logistic Regression (Viaene, et.al, 2002) Solution 3: PRIDIT Method (Brockett et.al, 2002) Solution 4: Regression Tree Methods (Derrig, Francis, 2007) REFERENCES Brockett, Patrick L., Derrig, Richard A., Golden, Linda L., Levine, Albert and Alpert, Mark, (2002), Fraud Classification Using Principal Component Analysis of RIDITs, Journal of Risk and Insurance, 69:3, 341-373. Brockett, Patrick L., Xiaohua, Xia and Derrig, Richard A., (1998), Using Kohonen’ SelfOrganizing Feature Map to Uncover Automobile Bodily Injury Claims Fraud, Journal of Risk and Insurance, 65:245-274 Derrig, Richard A., (2002), Insurance Fraud, Journal of Risk and Insurance, 69:3, 271-289. Derrig, Richard A., and Ostaszewski, K., (1995) Fuzzy Techniques of Pattern Recognition in Risk & Claim Classification, 62:3, 147-182. Francis, Louise and Derrig, Richard A., (2007) Distinguishing the Forest from the TREES: Working Paper. Rempala, G.A., and Derrig, Richard A., (2003), Modeling Hidden Exposures in Claim Severity via the EM Algorithm, North American Actuarial Journal, 9(2), 108-128. Viaene, S., Derrig, Richard A. et. al, (2002) A Comparison of State-of-the-Art Classification Techniques for Expert Automobile Insurance Fraud Detection, Journal of Risk and Insurance, 69:3, 373-423. Weisberg, H.I. and Derrig R.A., (1998), Quantitative Methods for Detecting Fraudulent Automobile Bodily Injury Claims, RISQUES, vol. 35, pp. 75-101, July-September (In French, English available) Claim Fraud Detection Plan STEP 1:SAMPLE: Systematic benchmark of a random sample of claims. STEP 2:FEATURES: Isolate red flags and other sorting characteristics STEP 3:FEATURE SELECTION: Separate features into objective and subjective, early, middle and late arriving, acquisition cost levels, and other practical considerations. STEP 4:CLUSTER: Apply unsupervised algorithms (Kohonen, PRIDIT, Fuzzy) to cluster claims, examine for needed homogeneity. Claim Fraud Detection Plan STEP 5:ASSESSMENT: Externally classify claims according to objectives for sorting. STEP 6:MODEL: Supervised models relating selected features to objectives (logistic regression, Naïve Bayes, Neural Networks, CART, MARS) STEP7:STATIC TESTING: Model output versus expert assessment, model output versus cluster homogeneity (PRIDIT scores) on one or more samples. STEP 8:DYNAMIC TESTING: Real time operation of acceptable model, record outcomes, repeat steps 1-7 as needed to fine tune model and parameters. Use PRIDIT to show gain or loss of feature power and changing data patterns, tune investigative proportions to optimize detection and deterrence of fraud and abuse.