Comptroller of the Currency Administrator of National Banks Managing Model Risk in Retail Scoring Dennis Glennon Credit Risk Analysis Division Office of the Comptroller of the Currency September 28, 2012 The opinions expressed in this paper are those of the authors and do not necessarily reflect those of the Office of the Comptroller of the Currency. All errors are the responsibilities of the authors. Filename Agenda Introduction to Model Risk Managing Model Risk What is it? Why is it relevant? Overview of Sound Model Development and Validation Procedures Emerging Issues Related to Model Risk 2 Filename 2 Models Risk: What is it? Model Risk – Potential for adverse consequences from decisions based on incorrect or misused model outputs Model errors that produce inaccurate outputs Model may be used incorrectly or inappropriately (i.e., using a model outside the environment for which it was designed). Model risk emerges from the process used to develop models for measuring credit risk. The process introduces a secondary loss exposure beyond that of credit risk alone e.g., poor underwriting decisions based on erroneous models or overly broad interpretations of model results. 3 Filename 3 Model Risk: What is it? Credit Risk: The risk to earnings or capital from an obligor's failure to meet the terms of any contract with the bank or otherwise fails to perform as agreed. A conceptually distinct exposure to loss. There are many reasons for poor model-based results including: Poor modeling (i.e., inadequate understanding of the business) Poor model selection (i.e., overfitting) Inadequate understanding of model use Changing conditions in the market 4 Filename 4 Managing Model Risk The goal of model-risk analysis is to isolate the effect of a bank's choice of risk-management strategies from those associated with incorrect or misused model output. Model Validation is an essential component of a sound model-risk management process. Validate at time of model development/implementation Ongoing monitoring Re-validate 5 Filename 5 Model Risk Model validation can be costly. However, using unvalidated models to underwrite, price, and/or manage risk is potentially an unsafe and unsound practice. The best defense against model risk is the implementation of formal, prudent, and comprehensive model-validation procedures. 6 Filename 6 Model Risk: Sound Modeling Practices Sound modeling practices In many cases, there are generally accepted methods of building and validating models. These methods incorporate procedures developed in the finance, statistics, econometrics, and information theory literature. Although these methods are valid, they may not be appropriate in all applications. A model selected for its ability to discriminate between high and low risk may perform poorly at predicting the likelihood of default. 7 Filename 7 Models as Decision Tools Two primary modeling objectives Classification: The model is used to rank credits by their expected relative performance Prediction: The model is used to accurately predict the probability of the outcome Modelers typically have one of these objectives in mind when developing and validating their models 8 Filename 8 Model Selection: Which model is better? obs. good (G) - y=0 obs. bad (B) - y=1 Model 1 1 3 5 Model 2 7 9 1 y 4 11 1 [0.1] [0.3] [bad rate] 9 7 [0.5] [0.7] [0.9] 5 3 1 [0.08] [0.45] [bad rate] 11 0 10 20 30 40 50 60 70 80 90 100 Score (quintiles) Filename 4 y 1 0 5 0 0 10 [#B / (#G + #B)] 6 [0.44] 5 [0.67] [0.92] 2 1 20 30 40 50 60 70 80 90 100 Score (quintiles) 9 9 Models as Decision Tools A comparison of models: visual summary Reliable and Accurate Reliable, but not Accurate 7 development log odds 6 actual log odds log(odds) 5 actual log odds 5 development log odds 4 Odds: 33:1 Bad %: 3.0% 4 6 3 3 2 2 Odds: 12.2:1 Bad %: 7.6% 1 Odds: 33:1 Bad %: 3.0% 1 Score: 253 0 0 320 300 280 score 260 240 225 309 289 269 249 229 10 Filename 209 Illustrative Example Risk-Rating Model 7 Development (K-S = 32.1) 6 Validation (K-S = 34.3) ln(20/1) = 3.0 bad rate = 5% ln(good/bad) 5 4 3 2 ln(4/1) = 1.4 bad rate = 20% 1 0 644 653 665 675 684 693 706 715 725 739 753 Score Bands 11 Filename Models as Decision Tools The model design should reflect how the model will be used. As such, the choices of: sample design modeling technique validation procedures should reflect the intended purpose for which the model will ultimately be used. To effectively manage model risk, the right tools must be used. 12 Filename 12 Models as Decision Tools Models are developed for different purposes – i.e., classification or prediction. As such, the choices of: sample design modeling technique validation procedures are driven by the intended purpose for which the model will ultimately be used. 13 Filename 13 Model Validation The classification objective is the weaker of two conditions. There are well-developed methods outlined in the literature and accepted by the industry that are used to assess the validity of models developed under that objective. In practice, we see: Filename Development KS / Gini used as the primary model selection tool These evaluated on the development, hold out, and outof-time samples Validation KS / ΔKS Stability test (e.g., PSI, characteristic analysis, etc.) Backtesting analysis 14 14 Model Validation Almost all scoring models generate KS values that reject the null that the distribution of good accounts is equal to the distribution of bads. KS is also used to identify a specific model with the maximum separation across alternative models. In practice, however, the difference between the maxKS and those of alternative models is never tested using statistical methods (although there are tests outlined in the literature – e.g., Krzanowski and Hand, 2011). More importantly, once a model is selected, few modelers apply a statistical test to determine if the KS has change significantly over time to conclude the model is no longer working as expected. 15 Filename 15 Model Validation The test that have been developed, however, tend to be sensitive to sample size. Given the size of development and validation samples, very small changes may be statistically significant. OPEN ISSUE 1: Are there tests banks can use to test for statistical significance that are not overly sensitive to sample size. 16 Filename 16 Model Validation Predictive models are developed under a model accuracy objective. As a result, a goodness-of-fit test is required for model selection. Common performance measures used to evaluate predictive models: Interval Test Chi-Square Test Hosmer-Lemeshow (H-L) Test Unfortunately, the goodness-of-fit tests assume defaults are independent events. If the events are dependent, the tests will reject the null too frequently. 17 Filename 17 Model Validation The Vasicek Test is an alternative test of accuracy that allows for dependence. The Vasicek Test is designed to capture the effect of dependence on the size of the confidence bands. Formula used to derive the confidence bands Vint 1 ( PD) Z.95 (1 ) where Vint is the width of the interval; ~ N(0,1); Z.95=1.64; and ρ – correlation. 18 Filename 18 Vasicek Test: An Example Vasicek Test Analysis Segment Accounts Estimated PD Actual PD Upper Bound 95% CI Vasicek ρ = 0.15 ρ = 0.05 ρ = 0.015 1 1000 0.00000 0.00200 0.000003 0.00000 0.00000 0.00005 2 1000 0.00001 0.00000 0.000058 0.00004 0.00003 0.00024 3 1000 0.00008 0.00000 0.000323 0.00023 0.00015 0.00062 4 1000 0.00031 0.00100 0.001272 0.00087 0.00059 0.00141 5 1000 0.00102 0.00400 0.003957 0.00265 0.00183 0.00299 6 1000 0.00313 0.00800 0.011466 0.00760 0.00536 0.00659 7 1000 0.01003 0.01900 0.033541 0.02230 0.01618 0.01620 8 1000 0.03767 0.06300 0.107877 0.07392 0.05605 0.04948 9 1000 0.18798 0.26700 0.393836 0.29771 0.24538 0.21220 10 1000 0.75928 0.54900 0.927103 0.86425 0.81919 0.78578 19 Filename Model Validation: Vasicek Test If ρ is too high the bands are too wide: too many models would pass the test ρ is not known and has to be estimated. For point-in-time based models, ρ can be very small For through-the-cycle based models, ρ can be large In practice, we often see models fail the interval/Chi-square test, but pass the Vasicek test (especially when samples are large). Open Issue 2: How do we resolve the inconsistency? 20 Filename 20 Sensitivity of Validation Test to Sample Size Accuracy tests tend to reject models that discriminate well consistent with the expectations of the LOB Measurement can be so precise that even a small, non-relevant difference in point estimates can be considered statistically significant. 21 Filename 21 Illustrative Example Default Rates default rate 40.00 30.00 20.00 10.00 0.00 actual 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 34.11 22.75 16.18 13.63 11.44 9.84 9.07 7.60 7.35 6.83 6.37 5.72 5.41 4.49 4.41 3.57 3.44 2.93 2.27 1.14 predicted 34.56 22.55 16.59 13.27 11.48 9.86 8.65 7.90 7.16 6.54 5.97 5.50 5.04 4.64 4.14 3.76 3.38 2.96 2.46 1.43 score range 22 Filename 22 Illustrative Example Seg 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Default 4027 2992 1847 1184 878 1007 598 536 474 507 459 373 380 339 355 244 239 246 217 208 Non-Default 7780 10158 9568 7505 6795 9223 5996 6512 5973 6913 6752 6150 6647 7214 7698 6584 6712 8145 9360 17978 Total 11807 13150 11415 8689 7673 10230 6594 7048 6447 7420 7211 6523 7027 7553 8053 6828 6951 8391 9577 18186 Default Rate Actual Predicted 34.11 22.75 16.18 13.63 11.44 9.84 9.07 7.60 7.35 6.83 6.37 5.72 5.41 4.49 4.41 3.57 3.44 2.93 2.27 1.14 34.56 22.55 16.59 13.27 11.48 9.86 8.65 7.90 7.16 6.54 5.97 5.50 5.04 4.64 4.14 3.76 3.38 2.96 2.46 1.43 p-values (cv - 5%) 0.3039 0.5832 0.2390 0.3226 0.9125 0.9459 0.2250 0.3506 0.5541 0.3124 0.1516 0.4357 0.1562 0.5354 0.2238 0.4094 0.7819 0.8712 0.2296 0.0010 HL stat p-value HL 1.0572 0.3011 1.3867 0.9787 0.0121 0.0046 1.4722 0.8713 0.3500 1.0205 2.0568 0.6076 2.0109 0.3842 1.4799 0.6806 0.0767 0.0263 1.4432 10.8227 27.0433 0.0782 23 Filename 23 Illustrative Example p-values 1.00 0.80 p 0.60 0.40 0.20 n-sample 3n-sample c-value 19 17 15 13 11 9 7 5 3 1 0.00 score range 24 Filename 24 Interval Tests with Large Samples Conclusion: Statistical difference: significant Economic difference: insignificant Solutions? Reduce the number observations using a sample: less powerful test Redefine the test Interval test Focus on capital 25 Filename 25 Interval Tests with Large Samples (5) (4) (3) (2) (1) -1% 0 +1% 26 Filename 26 Interval Test Restate the null as an interval defined over an economically acceptable range If the CI1-α around the point estimate is within the in interval, conclude no economically significant difference May want to reformulate the interval test in terms of an acceptable economic bias in the calculation of regulatory capital Open Issue 3: How do we reconcile business and statistical significance? 27 Filename 27 Conclusion Active management of model risk Sound model development, implementation, and use of models are vital elements, and Rigorous model validation is critical to effective model risk management. Model Risk should be managed like other risks Identify the source Manage it properly 28 Filename 28