CANE 2007 Spring Meeting Visualizing Predictive Modeling Results Chuck Boucek (312) 879-3859 Agenda • Data Validation • Hypothesis Building • Model Building • Model Testing • Monitoring • Visualization as a Diagnostic Tool 1 Data Validation • Goals – Validate reasonableness of data – Understand key patterns in data – Understand changes in data and underlying business through time 2 Data Validation • Histogram is a simple tool to for reasonability testing of modeling database 0.20 Density 0.15 0.10 0.05 0.0 0 2 4 6 8 10 Number of Times Renewed 3 Data Validation • Mosaic Plot shows the distribution of predictors in two dimensions 2 2 3 4 5 6 7 109 8 NBR.TIME.RENEW 1 0 1 CLASS.GROUP 4 Observations (reordered) • Missing Data plot shows the relationship of missing data elements Credit3 Credit2 Credit1 Demographic3 Demographic2 Demographic1 Internal8 Internal7 Internal6 Internal5 Internal4 Internal3 Internal2 Internal1 Data Validation 1 20000 40000 60000 80000 100000 120000 140000 5 Data Validation 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Claims Match to Exposure • Time series plots identify consistency of data over time Company 1 Company 2 Company 3 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 6 Hypothesis Building • Goals – Perform initial analysis of potential predictor variables – Limit the list of predictor variables to be employed in subsequent phases of model building – Further reasonability testing of data 7 0 0.001 0.01 0.02 0.3 0.001 0.01 0.02 0.3 200 0 150 0.3 0.3 100 0.02 0.02 50 0.01 0.01 0 0.001 0.001 Premium ($MM) 0 0 10000 0.3 0.3 Exposure ($MM) 2500 5000 7500 0.02 0.02 0 0.01 0.01 40000 0.001 32500 0.001 25000 Severity 0 17500 0 10000 0.250 0.0 5000 0.375 0.375 0.500 Loss Ratio 0.750 Frequency 10000 0.625 1.125 12500 Pure Premium 7500 0.750 1.500 15000 Demographic Variable 1 8 Hypothesis Building QQN OR M Plot of log(DEMOGR APH IC 2) 12 QQN OR M Plot of DEMOGR APH IC 2 0 9 50000 10 100000 11 150000 200000 • Quantile-Quantile plots help identify needed transformations of data -4 -2 0 2 Quantiles of Standard Normal 4 -2 0 2 4 Quantiles of Standard Normal 9 Hypothesis Building • Correlation Web concisely summarizes a correlation matrix Credit1 Credit10 Credit2 Credit9 Credit3 0.05 0.37 0.69 Credit8 Credit4 Credit7 Credit5 Credit6 10 Model Building • Model building is an iterative process • Understanding patterns and relationships throughout this process is critical 11 Model Building • Partial Plots are a key tool to visualize predictor variables throughout the model building process • What is a “Partial Plot?” Linear Predictor = k + b1X1 + b2X2 + b3X3 + b4X4 Predicted value = (ek) x (eb1X1) x (eb2X2) x (eb3X3) x (eb4X4) • Partial Plot demonstrates an individual predictor variables contribution to final prediction 12 Model Building • Partial Plot demonstrates an individual predictor variables contribution to final prediction 8000 1.25 4000 6000 1.00 2000 0.75 0 0.50 0 5 10 15 20 25 30 0 5 10 15 20 25 30 13 Model Building 0.25 0.50 0.75 1.000 1.25 • Partial Plot with modified scatter plot of variable 0 10 20 30 14 Model Building • Time Consistency plot is a critical tool for numeric predictors 0 0 5 10 5 10 15 1997 1998 1999 2000 15 20 20 25 25 15 Model Building • Partial Plot for a factor variable 1.30 1.20 Credit Level 2 1.10 1.00 0.80 0.70 Credit Level 1 0.90 16 200 Yes 150 Yes 100 No 50 No 0 Yes Yes Premium ($MM) No No 0 Exposure (Pred. Count) 2500 5000 7500 10000 40000 Yes 32500 Yes 25000 No 17500 No 10000 Severity 0.250 0.0 5000 0.375 0.375 7500 0.500 Loss Ratio 0.750 Frequency 10000 0.625 1.125 12500 Pure Premium 0.750 1.500 15000 Credit Variable 1 17 Model Testing • Likely the most critical visualizations in predictive modeling work – Management’s perception of a project’s success will likely depend on these visualizations • Holdout tests • Cross validation tests 18 Model Testing • Lift Chart shows overall model performance Loss Ratio Lift Chart - Holdout Sample 1.0 1.0 Predicted Actual 0.9 0.7 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 19 Model Testing • ROC Curve shows overall model performance 0.2 0.4 0.6 0.8 1.0 Holdout Sample ROC Curve 0.0 Null, 0 Perfect, 1 prem, 0.51 pred.loss, 0.56 0.0 0.2 0.4 0.6 0.8 1.0 20 Model Testing • Classical Cross Validation exhibit Prediction Error Out of Sample Error Number of variables in final model In Sample Error 5 10 15 20 25 Number of Predictors 21 Monitoring Model Results • The work does not end when the lift chart looks good • Monitoring tools – Decile management – Exception analysis – Model vs. Actual Results 22 Monitoring Model Results • Decile Management – – – – Retention Loss Ratio Rate Action Tier/Schedule Mod Retention by Decile 0.95 0.95 0.90 0.90 0.85 0.85 0.80 0.80 0.75 0.75 0.70 0.70 23 Monitoring Model Results • Average score over time Average Score by Quarter 800 750 700 650 600 0401 0402 0403 0404 0501 0502 0503 0504 0601 0602 24 Monitoring Model Results • Loss ratio of model exceptions Loss Ratio of Underw riting Exceptions 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 Guideline Exception 25 Visualization as Diagnostic Tool • Frequency and severity models have been developed • Model is underperforming in predicting loss ratio • Likely cause of underperformance is severity model 26 Visualization as Diagnostic Tool 3000 Severity Variable #1 2500 1.4 2000 1.3 1500 1.2 1000 1.1 1.0 500 0.9 0 0.8 0 2*10^5 6*10^5 10^6 0 2*10^5 4*10^5 6*10^5 8*10^5 10^6 27 Visualization as Diagnostic Tool Severity Variable #1 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0 2*10^5 4*10^5 6*10^5 8*10^5 10^6 28 Visualization as Diagnostic Tool 0.8 0.9 1.0 Severity Variable #1 10^2 10^3 10^4 10^5 10^6 29 Visualization as Diagnostic Tool • Two different visualizations of the same model tell a very different story! Severity Variable #1 1.4 1.3 1.0 1.2 1.1 0.9 1.0 0.8 0.9 0.8 0 2*10^5 6*10^5 10^6 10^2 10^3 10^4 10^5 10^6 30