Health Insurance Conference 2012 Predictive Modelling “GLMs and beyond GLMs” Singapore – May 2012 Xavier Conort AGENDA 1. GLMs: The Good, the Bad and the Ugly 2. Trees or how to detect interactions 3. GLM(ixed)M or how to handle variables with a large number of categories 4. Regularized GLMs or how to handle texts or data with a large number of predictors 5. The PRIDIT method or how to handle with no or little information on the response Gear Analytics GLMs is a standard but be aware of its limitations Gear Analytics GOOD BAD UGLY Recognized as a standard in the insurance industry Need to pre-process data (missing values, outliers, dimension reduction) GLM models do not automatically account for interactions if you don’t specify them in the model structure. Accommodate responses with skewed distributions Lots of literature and readily available software solutions Simple mathematical formula easy to implement Diagnostics tools and confidence intervals Parametric models (good when you know well your data) The assumptions underlying GLMs may not hold: Independence of observations Appropriateness of the link function Appropriateness of the error function Risk to rely too much on diagnostic tools. Need to test on unseen cases. Iterative modelling process time-consuming and complex Think about latitude and longitude. Is it correct to assume North-East effect= North X East effects? GLMs will provide you with estimates even if the standard errors are unreasonably high GLMs (and other supervised learning techniques) work only if you have reliable information on the response. This info is not always available. Think Gear Analytics fraud detection ! How smart actuaries detect potential interactions • • • • • • luck intuition descriptive analysis experience market practices Machine Learning techniques based on decision trees Gear Analytics Regression trees are known to detect interactions …but usually have lower predictive power than GLMs and are unstable. By construction, regression trees partition the feature space into a set of rectangles and then produce a multitude of local interactions Gear Analytics Random Forest will provide you with higher predictive power… … but less interpretability … A Random Forest is: • a collection of weak and independent decision trees such that each tree has been trained on a bootstrapped dataset with a random selection of predictors (think about the wisdom of crowds) Gear Analytics Boosted Regression Trees or learn step by step slowly • BRTs (also called Gradient Boosting Machine) use boosting and decision trees techniques: • The boosting algorithm gradually increases emphasis on poorly modelled observations. It minimizes a loss function (the deviance, as in GLMs) by adding, at each step, a new simple tree whose focus is only on the residuals • The contributions of each tree are shrunk by setting a learning rate very small (and < 1) to give more stable fitted values for the final model • To further improve predictive performance, the process uses random subsets of data to fit each new tree (bagging). Gear Analytics Why do I love BRTs? • BRTs can be fitted to a variety of response types (Gaussian, Poisson, Binomial) • BRTs best fit (interactions included) is automatically detected by the machine • BRTs learn non-linear functions without the need to specify them • BRT outputs have some GLM flavour and provide insight on the relationship between the response and the predictors • BRTs avoid doing much data cleaning because of their • ability to accommodate missing values • immunity to monotone transformations of predictors, extreme outliers and irrelevant predictors Gear Analytics BRTs’ Partial dependence plots Non-linear relationship detected automatically represent the effect of each predictor after accounting for the effects of the other predictors Gear Analytics Plot of interactions fitted by BRTs Gear Analytics BRTs’ prediction formula Let’s consider 1 numerical predictor Xnum and 1 categorical predictor Xcat (with two levels) • GLMs’ prediction formula will be • • Yhat=g-1(β0+βnum*Xnum+βcat*I(Xcat==1)) • with g the link function BRTs’ prediction formula is more complex and less easily implementable • Yhat=g-1(β0+βnum1*I(Xnum<α1)+βnum2*I(Xnum<α2)+… +βcat*I(Xcat==1) +βint1*I(Xnum<γ1 & Xcat==0)+βint2*I(γ2<Xnum<γ3 & Xcat==1)+…) Gear Analytics How smart actuaries handle with factors with a large nb of categories • In GLMs, predictors with many levels (e.g. territory, car models) and little statistical material aren’t credibility adjusted. • GLMs diagnostics will only alert you. Relativities of levels with little exposure are squarely in the middle of wide confidence intervals driven by large standard errors. • In practice, ad hoc credibility adjustments are applied by actuaries before deploying the model • Generalized Linear Mixed Models (GLMMs) can accomplish this credibility adjustment by modelling both fixed and random effects and provide credibility estimates automatically. Gear Analytics How smart actuaries handle with data with a large nb of variables • GLMs are sensitive to multicollinearity and provide you with estimates for every single predictors which lead to over-fitting and unrobust results • By fitting Regularized GLMs, you can automatically select most relevant predictors and accommodate multicollinearity by introducing a penalty in the loss function (the deviance) to minimize. Here, for a gaussian error: Gear Analytics The penalty effect in a regularized GLM Gear Analytics How to make use of texts • Usually, punctuations and numbers are removed and words are stemmed. But varies with the application. • Rare and very frequent words are discarded • A document-term matrix is produced: – Incidence or frequency matrix • The matrix is sometimes scaled to put more emphasis on rare but predictive words • Regularized GLMs are applied to the matrix (with sometimes 5000 columns!). • Alternative: Support Vector Machine Gear Analytics How smart insurance companies handle with fraud • GLMs are sometimes presented as a potential technique in fraud detection but in practice, they fail because: – history of fraud cases are insufficient and incomplete – do not detect previously undetected fraud cases • In practice, companies use a series of red flags (based on categorical and numerical variables) but fail to have a single indicator • Numerous actuarial articles in the past years presented a unsupervised technique (no label to train) called PRIDIT as an efficient actuarial way to make use of those operational red flags Gear Analytics PRIDIT technique basic ideas • Transform all numerical and ordinal red flags in a same scale (values between -1 and 1) using RIDIT statistics (based on cumulative distribution) Cumulative Level distribution Ridit score 1 0.2 -0.8 2 0.4 -0.4 3 0.6 0 4 0.8 0.4 5 1 0.8 • Apply Principal Component Analysis (PCA) to the RIDIT scores to produce a single indicator Gear Analytics But what is PCA’s basic idea? Maximize the variance of the projected data on a few axis Gear Analytics Example (1/4) • Suppose we want to combine all the information related to fraud We compute their ridit scores to put them all in the same scale (including numeric variables) From Fraud Classification Using PCA of Ridits – The Journal of Risk and Insurance, 2002, Vol. 69, No3, 341-371 Gear Analytics Example (2/4) • We apply PCA to replace many variables with a score – We look for the factor that explains the most variance (captures most of the correlation) for the set of variables – That factor extracted will be a weighted average of the variables (a score) • That score can be used to sort claims – More effort can be spent on claims more likely to be fraudulent or abusive Gear Analytics Example (3/4) One can decide to investigate claim 3 first, claim 7 next, and pay the rest of the 10 claims (or if ressources allow, investigate in increasing PRIDIT score order untill resources are exhausted). Gear Analytics Example (4/4) Factors loading are also sometimes used to explain the importance of variables Component Matrixa Component 1 S IU .2 48 Poli ce Report .2 20 At Fault .7 09 Leg al Rep .7 52 Medical Audit .3 41 Prior Cl ai m .4 06 Extracti on Method: Princi pal Component Analysi s. a. 1 co mpo nent s ext racted. Gear Analytics Does it work? • Based on the US actuaries papers: Yes! • There appears to be a strong relationship between PRIDIT score and suspicion that claim is fraudulent or abusive • The Society of Actuary even funded a study which extends the use of the PRIDIT technique to the measurement of Hospital quality Gear Analytics