Managing model risk in retail scoring

advertisement
Comptroller of the Currency
Administrator of National Banks
Managing Model Risk in Retail Scoring
Dennis Glennon
Credit Risk Analysis Division
Office of the Comptroller of the Currency
September 28, 2012
The opinions expressed in this paper are those of the authors and do not necessarily reflect those of
the Office of the Comptroller of the Currency. All errors are the responsibilities of the authors.
Filename
Agenda

Introduction to Model Risk



Managing Model Risk


What is it?
Why is it relevant?
Overview of Sound Model Development and
Validation Procedures
Emerging Issues Related to Model Risk
2
Filename
2
Models Risk: What is it?

Model Risk – Potential for adverse consequences
from decisions based on incorrect or misused model
outputs




Model errors that produce inaccurate outputs
Model may be used incorrectly or inappropriately (i.e.,
using a model outside the environment for which it was
designed).
Model risk emerges from the process used to
develop models for measuring credit risk.
The process introduces a secondary loss exposure
beyond that of credit risk alone

e.g., poor underwriting decisions based on erroneous
models or overly broad interpretations of model results.
3
Filename
3
Model Risk: What is it?

Credit Risk: The risk to earnings or capital from an
obligor's failure to meet the terms of any contract with
the bank or otherwise fails to perform as agreed.


A conceptually distinct exposure to loss.
There are many reasons for poor model-based results
including:




Poor modeling (i.e., inadequate understanding of
the business)
Poor model selection (i.e., overfitting)
Inadequate understanding of model use
Changing conditions in the market
4
Filename
4
Managing Model Risk


The goal of model-risk analysis is to isolate the
effect of a bank's choice of risk-management
strategies from those associated with incorrect
or misused model output.
Model Validation is an essential component of
a sound model-risk management process.



Validate at time of model
development/implementation
Ongoing monitoring
Re-validate
5
Filename
5
Model Risk



Model validation can be costly.
However, using unvalidated models to
underwrite, price, and/or manage risk is
potentially an unsafe and unsound practice.
The best defense against model risk is the
implementation of formal, prudent, and
comprehensive model-validation procedures.
6
Filename
6
Model Risk: Sound Modeling Practices

Sound modeling practices



In many cases, there are generally accepted
methods of building and validating models.
These methods incorporate procedures developed
in the finance, statistics, econometrics, and
information theory literature.
Although these methods are valid, they may not
be appropriate in all applications.

A model selected for its ability to discriminate between
high and low risk may perform poorly at predicting the
likelihood of default.
7
Filename
7
Models as Decision Tools

Two primary modeling objectives

Classification: The model is used to rank
credits by their expected relative performance

Prediction: The model is used to accurately
predict the probability of the outcome

Modelers typically have one of these
objectives in mind when developing and
validating their models
8
Filename
8
Model Selection: Which model is better?
obs. good (G) - y=0
obs. bad (B) - y=1
Model 1
1
3
5
Model 2
7
9
1
y
4
11
1
[0.1]
[0.3]
[bad rate]
9
7
[0.5]
[0.7]
[0.9]
5
3
1
[0.08]
[0.45]
[bad rate]
11
0 10 20 30 40 50 60 70 80 90 100
Score (quintiles)
Filename
4
y
1
0
5
0
0
10
[#B / (#G + #B)]
6
[0.44]
5
[0.67]
[0.92]
2
1
20 30 40 50 60 70 80 90 100
Score (quintiles)
9
9
Models as Decision Tools

A comparison of models: visual summary
Reliable and Accurate
Reliable, but not Accurate
7
development log odds
6
actual log odds
log(odds)
5
actual log odds
5
development log odds
4
Odds: 33:1
Bad %: 3.0%
4
6
3
3
2
2
Odds: 12.2:1
Bad %: 7.6%
1
Odds: 33:1
Bad %: 3.0%
1
Score: 253
0
0
320
300
280
score
260
240
225
309
289
269
249
229
10
Filename
209
Illustrative Example
Risk-Rating Model
7
Development (K-S = 32.1)
6
Validation (K-S = 34.3)
ln(20/1) = 3.0
bad rate = 5%
ln(good/bad)
5
4
3
2
ln(4/1) = 1.4
bad rate = 20%
1
0
644
653
665
675
684
693
706
715
725
739
753
Score Bands
11
Filename
Models as Decision Tools


The model design should reflect how the
model will be used.
As such, the choices of:



sample design
modeling technique
validation procedures
should reflect the intended purpose for which
the model will ultimately be used.

To effectively manage model risk, the right
tools must be used.
12
Filename
12
Models as Decision Tools

Models are developed for different purposes
– i.e., classification or prediction. As such,
the choices of:



sample design
modeling technique
validation procedures
are driven by the intended purpose for which
the model will ultimately be used.
13
Filename
13
Model Validation



The classification objective is the weaker of two
conditions.
There are well-developed methods outlined in the
literature and accepted by the industry that are used to
assess the validity of models developed under that
objective.
In practice, we see:


Filename
Development

KS / Gini used as the primary model selection tool

These evaluated on the development, hold out, and outof-time samples
Validation

KS / ΔKS

Stability test (e.g., PSI, characteristic analysis, etc.)

Backtesting analysis
14
14
Model Validation




Almost all scoring models generate KS values that reject the null
that the distribution of good accounts is equal to the distribution
of bads.
KS is also used to identify a specific model with the maximum
separation across alternative models.
In practice, however, the difference between the maxKS and those
of alternative models is never tested using statistical methods
(although there are tests outlined in the literature – e.g.,
Krzanowski and Hand, 2011).
More importantly, once a model is selected, few modelers apply a
statistical test to determine if the KS has change significantly over
time to conclude the model is no longer working as expected.
15
Filename
15
Model Validation


The test that have been developed, however, tend to
be sensitive to sample size. Given the size of
development and validation samples, very small
changes may be statistically significant.
OPEN ISSUE 1: Are there tests banks can use to test
for statistical significance that are not overly sensitive
to sample size.
16
Filename
16
Model Validation




Predictive models are developed under a model
accuracy objective.
As a result, a goodness-of-fit test is required for model
selection.
Common performance measures used to evaluate
predictive models:
 Interval Test
 Chi-Square Test
 Hosmer-Lemeshow (H-L) Test
Unfortunately, the goodness-of-fit tests assume
defaults are independent events. If the events are
dependent, the tests will reject the null too frequently.
17
Filename
17
Model Validation



The Vasicek Test is an alternative test of accuracy that
allows for dependence.
The Vasicek Test is designed to capture the effect of
dependence on the size of the confidence bands.
Formula used to derive the confidence bands
Vint
  1 ( PD)   Z.95 
 

(1  )


where Vint is the width of the interval;  ~ N(0,1);
Z.95=1.64; and ρ – correlation.
18
Filename
18
Vasicek Test: An Example
Vasicek Test Analysis
Segment
Accounts
Estimated PD
Actual PD
Upper Bound
95% CI
Vasicek
ρ = 0.15
ρ = 0.05
ρ = 0.015
1
1000
0.00000
0.00200
0.000003
0.00000
0.00000
0.00005
2
1000
0.00001
0.00000
0.000058
0.00004
0.00003
0.00024
3
1000
0.00008
0.00000
0.000323
0.00023
0.00015
0.00062
4
1000
0.00031
0.00100
0.001272
0.00087
0.00059
0.00141
5
1000
0.00102
0.00400
0.003957
0.00265
0.00183
0.00299
6
1000
0.00313
0.00800
0.011466
0.00760
0.00536
0.00659
7
1000
0.01003
0.01900
0.033541
0.02230
0.01618
0.01620
8
1000
0.03767
0.06300
0.107877
0.07392
0.05605
0.04948
9
1000
0.18798
0.26700
0.393836
0.29771
0.24538
0.21220
10
1000
0.75928
0.54900
0.927103
0.86425
0.81919
0.78578
19
Filename
Model Validation: Vasicek Test


If ρ is too high the bands are too wide: too
many models would pass the test
ρ is not known and has to be estimated.




For point-in-time based models, ρ can be very small
For through-the-cycle based models, ρ can be large
In practice, we often see models fail the
interval/Chi-square test, but pass the Vasicek
test (especially when samples are large).
Open Issue 2: How do we resolve the
inconsistency?
20
Filename
20
Sensitivity of Validation Test to Sample Size

Accuracy tests tend to reject models that



discriminate well
consistent with the expectations of the LOB
Measurement can be so precise that even a
small, non-relevant difference in point estimates
can be considered statistically significant.
21
Filename
21
Illustrative Example
Default Rates
default rate
40.00
30.00
20.00
10.00
0.00
actual
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
34.11 22.75 16.18 13.63 11.44 9.84 9.07 7.60 7.35 6.83 6.37 5.72 5.41 4.49 4.41 3.57 3.44 2.93 2.27 1.14
predicted 34.56 22.55 16.59 13.27 11.48 9.86 8.65 7.90 7.16 6.54 5.97 5.50 5.04 4.64 4.14 3.76 3.38 2.96 2.46 1.43
score range
22
Filename
22
Illustrative Example
Seg
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Default
4027
2992
1847
1184
878
1007
598
536
474
507
459
373
380
339
355
244
239
246
217
208
Non-Default
7780
10158
9568
7505
6795
9223
5996
6512
5973
6913
6752
6150
6647
7214
7698
6584
6712
8145
9360
17978
Total
11807
13150
11415
8689
7673
10230
6594
7048
6447
7420
7211
6523
7027
7553
8053
6828
6951
8391
9577
18186
Default Rate
Actual
Predicted
34.11
22.75
16.18
13.63
11.44
9.84
9.07
7.60
7.35
6.83
6.37
5.72
5.41
4.49
4.41
3.57
3.44
2.93
2.27
1.14
34.56
22.55
16.59
13.27
11.48
9.86
8.65
7.90
7.16
6.54
5.97
5.50
5.04
4.64
4.14
3.76
3.38
2.96
2.46
1.43
p-values
(cv - 5%)
0.3039
0.5832
0.2390
0.3226
0.9125
0.9459
0.2250
0.3506
0.5541
0.3124
0.1516
0.4357
0.1562
0.5354
0.2238
0.4094
0.7819
0.8712
0.2296
0.0010
HL stat
p-value
HL
1.0572
0.3011
1.3867
0.9787
0.0121
0.0046
1.4722
0.8713
0.3500
1.0205
2.0568
0.6076
2.0109
0.3842
1.4799
0.6806
0.0767
0.0263
1.4432
10.8227
27.0433
0.0782
23
Filename
23
Illustrative Example
p-values
1.00
0.80
p
0.60
0.40
0.20
n-sample
3n-sample
c-value
19
17
15
13
11
9
7
5
3
1
0.00
score range
24
Filename
24
Interval Tests with Large Samples


Conclusion:
 Statistical difference: significant
 Economic difference: insignificant
Solutions?
 Reduce the number observations using a
sample: less powerful test
 Redefine the test
 Interval test
 Focus on capital
25
Filename
25
Interval Tests with Large Samples
(5)
(4)
(3)
(2)
(1)
-1%
0
+1%
26
Filename
26
Interval Test

Restate the null as an interval defined over an
economically acceptable range



If the CI1-α around the point estimate is within the in
interval, conclude no economically significant
difference
May want to reformulate the interval test in terms of
an acceptable economic bias in the calculation of
regulatory capital
Open Issue 3: How do we reconcile business
and statistical significance?
27
Filename
27
Conclusion

Active management of model risk



Sound model development, implementation, and use
of models are vital elements, and
Rigorous model validation is critical to effective
model risk management.
Model Risk should be managed like other risks


Identify the source
Manage it properly
28
Filename
28
Download