validation_of_rating_system

advertisement
Validation of Rating System
Datamining-Solutions
1
Outline
 What is validation of rating systems?
 Two components of validation of rating systems
 discrimination,
 calibration.
 Discrimination methods
 Calibration methods
Datamining-Solutions
2
What is validation of rating systems?
Datamining-Solutions
3
What is validation of rating systems?
Having set up a rating system, it is obvious that we wants to assess its quality.
Validation of rating system rely on using a lot of distinguish methods to assess its
quality.
There are two dimensions along which rating systems are commonly assesed:
discrimination and calibration
Datamining-Solutions
4
Components of validation of rating systems
Datamining-Solutions
5
Components of validation of rating systems
Discrimination:
 In checking discrimination, we ask: How well does a rating system rank
borrowers according to their probability of default (PD)?
Calibration:
 When examining calibration, we ask: How well do estimated PDs match true
PDs?
Datamining-Solutions
6
Discrimination methods
Datamining-Solutions
7
Discrimination methods
Accrording to Basel Committee on Banking Supervision we can mention the
following statistical methodologies for the assessment of discriminatory:
 Cumulative Accuracy Profile (CAP),
 Accuracy Ratio (AR),
 Receiver Operating Characteristic (ROC),
 ROC measure (AUC) approximated by Mann-Whitney statistic,
 Pietra Index – approximated by Kolmogorov-Smirnov statistic,
 Conditional entropy, Kullback-Leibler distance,
 Conditional Information Entropy Ratio (CIER),
 Information value (divergence, stability index),
Datamining-Solutions
8
Discrimination methods
Accrording to Basel Committee on Banking Supervision we can mention the
following statistical methodologies for the assessment of discriminatory
(continue):
 Bayesian error rate,
 Kendall’s τ and Somers’ D (for shadow ratings),
 Brier Score.
Datamining-Solutions
9
Receiver Operating Characteristic (ROC)
actual
default
nondefault
default
TP
FP
nondefault
FN
TN
prediction
as a function of cutt-off
Datamining-Solutions
10
Receiver Operating Characteristic (ROC)
Datamining-Solutions
11
Receiver Operating Characteristic (ROC)
PD
is_default
0
0
0,1
0,1
0,1
0,1
0,1
0,1
0,1
0,1
0,1
0,1
0,1
0,2
0,2
0,2
0,2
0,2
0,2
0,2
0,2
0,2
0,2
0,2
0,2
0,2
0,3
0,3
0,3
0,3
0,3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
avg_PD
0,16
1
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0
total
nondefault default
cum_toal cum_nondef cum_def KS
AUC
AR
3
4
7%
4%
25%
21%
0,004
0,009
7
4
18%
12%
50%
38%
0,036
0,050
5
2
25%
18%
63%
45%
0,069
0,089
9
3
37%
29%
81%
53%
0,146
0,176
10
2
49%
40%
94%
53%
0,250
0,281
8
1
58%
50%
100%
50%
0,343
0,368
7
0
65%
58%
100%
42%
0,426
0,438
9
0
74%
69%
100%
31%
0,533
0,528
13
0
87%
85%
100%
15%
0,688
0,658
11
0
98%
98%
100%
2%
0,819
0,768
2
0
100%
100%
100%
0%
0,843
0,788
84
16
53,27%
84,26%
68,53%
Datamining-Solutions
12
Receiver Operating Characteristic (ROC) (the first approach)
The area A is 0.5 for a random model without discriminative power and it is 1.0
for a perfect model. It is between 0.5 and 1.0 for any reasonable rating model in
practice.
Call:
input:
Matlab file: ROC.m
[xy,AUC]=ROC(ratings, defaults)
ratings – column vector of rarting value (higher value denote more risky)
defaults – 1 denote default, 0 denote non-default
Output:
xy- points collection to plot ROC,
AUC- Area Under Curve
-------------------------------------------------------------------------------------------------------------Matlab file data example: capexample.mat
Datamining-Solutions
13
Receiver Operating Characteristic (ROC) (the second approach)
Call:
Matlab file: groc.m
[AUC,AUC_m]=groc(pd,is_default,spos,is_pic)
input:
pd – column vector of rarting value
is_default– 1 denote default, 0 denote non-default
spos – three aproaches
is_pic – show curve
Output:
AUC- Area Under Curve using empirical data
AUC_m - Area Under Curve using aproximation
Datamining-Solutions
14
ROC measure (AUC) approximated by Mann-Whitney statistic
Assumption:The scores of the defaulter and the non-defaulter can be interpreted as
realisations of the two independent continuous random variables SD and SND
The area under the ROC curve (AUC) is equal to the probability that SD produces a
smaller rating score than SND.
AUC=P(SD<SND)
This interpretation relates to the U-test of Mann-Whitney. We split default vector on
two vector: defaulters and non-defaulters and calculate the test statistic Û of MannWhitney which is defined as

U
where Û is defined as
u D , ND
u
D , ND
( ND , D )
N D N ND
1, if

0, if
S D  S ND
S D  S ND
Datamining-Solutions
15
ROC measure (AUC) approximated by Mann-Whitney statistic
Matlab file: AUC_Whitney.m
Call:
[U]=AUC_Whitney(ratings, defaults)
input:
ratings – column vector of rarting value (higher value denote more risky)
defaults – 1 denote default, 0 denote non-default
Output:
U - the test statistic Û of Mann-Whitney
-------------------------------------------------------------------------------------------------------------Matlab file data example: capexample.mat
Datamining-Solutions
16
Pietra Index – approximated by Kolmogorov-Smirnov statistic
It is possible to interpret the Pietra Index as the maximum difference between
the cumulative frequency distributions of good and bad cases.
Pietra
index  max
F
good
cum
 F cum
bad

Interpreting the Pietra Index as the maximum difference between the cumulative
frequency distributions for the score values of good and bad cases makes it
possible to perform a statistical test for the differences between these
distributions.This is the Kolmogorov-Smirnov Test (KS Test) for two independent
samples.
The null hypothesis tested is: The score distributions of good and bad cases
are identical. It mean that when the null hypothesis wil be rejected so significant
differences exist between the rating values of good and bad cases
(discrimination).
Datamining-Solutions
17
ROC curbe model
Y=normcdf( a + b * norminv(x) )
Datamining-Solutions
18
Pietra Index – approximated by Kolmogorov-Smirnov statistic
Matlab file:
pietra.m
Call:
[pietra_index,H,confidence_level]=pietra(rating,is_default,q)
input:
ratings – column vector of rarting value (higher value denote more risky)
is_default – 1 denote default, 0 denote non-default
q - significance levels (default q=0.05)
Output:
pietra_index - asymptotic P-value
H:
1-The null hypothesis was rejected (discrimination)
0- The null hypothesis wasn’t rejected
confidence_level – 1-q (default 95%)
Datamining-Solutions
19
Cumulative Accuracy Profile (CAP) and Accuracy Ratio (AR)
We present two ways of calculation of Cumullative Accuracy Profile and Accuracy
Ratio:
 (1) the first approach: calculation based on sorted rating e.g. (AA,B...),
 (2) the second approach: calculation based on rating which is number without
classifying to concrete well-known rating e.g. (AA,B...).
Characteristic of the first approach (discrete approach):
 calculation based on every rating so we have the same number of points on the
curve as number of group of rating,
 advantage: this way is less risky because we avoid sorting problem
 disadvantage: the curve is less accurate (number of points=number of distinct
ratings
Characteristic of the second approach (continuous approach):
 calculation based on every observations (so we have the same number of points
on the curve as number of observation,
 advantage: the curve is more accurate than first approach
 disadvantage: this way is more risky because in the case the same value of
rating can occur sorting problem
Datamining-Solutions
20
Cumulative Accuracy Profile (CAP) (the first approach)
Datamining-Solutions
21
Cumulative Accuracy Profile (CAP) (the first approach)
The cumulative accuracy profile (CAP) provides a way of visualizing discriminatory
power.
The key idea is the following: if a rating system discriminates well, defaults should
occur mainly among borrowers with a bad rating.
To graph a CAP, we need historical data of ratings and default behavior.
The example was presented on the below picture
Datamining-Solutions
22
Accuracy Ratio (AR) (the first approach)
An accuracy ratio (AR) condenses the information contained in CAP curves into a
single number. It can be obtained by relating the area under the CAP but above the
diagonal to the maximum area the CAP can enclose above the diagonal. Thus, the
maximum accuracy ratio is 1.
We compute the accuracy ratio as A/B, where A is the area pertaining to the
rating system under analysis, and B is the one pertaining to the ‘perfect’ rating
system.
Datamining-Solutions
23
Cumulative Accuracy Profile (CAP) (the first approach)
Call:
input:
Matlab file: CAP.m
[xy,AR]=CAP(ratings, defaults)
ratings – column vector of rarting value (higher value denote more risky)
defaults – 1 denote default, 0 denote non-default
Output:
xy- points collection to plot CAP,
AR- Accuracy Ratio
-------------------------------------------------------------------------------------------------------------Matlab file data example: capexample.mat
Datamining-Solutions
24
Cumulative Accuracy Profile (CAP) (the second approach)
Call:
input:
Matlab file: cap_continuous.m
[AR]=cap_continuous(pd,is_def)
pd – column vector of rarting value
is_def– 1 denote default, 0 denote non-default
Output:
AR- Accuracy Ratio
Datamining-Solutions
25
Conditional entropy, Kullback-Leibler distance
Consider as rating system that, applied to an obligor, produces a random score
S. If D denotes the event “obligor defaults” and D denotes the complementary
event “obligor does not default”, we can apply the information entropy H to the
P(D |S), the conditional probability of default given the rating score S. The result
of this operation can be considered a conditional information entropy of the
default event,
H (P(D | S ))  (P(D | S ) log P(D | S )  P(D | S ) log P(D | S ))
and as such is a random variable whose expectation can be calculated. This
expectation is called Conditional Entropy of the default event (with respect to
the rating score S), and can formally be written as
Hs  H (P(D | S ))  E[P(D | S ) log P(D | S )  P(D | S ) log P(D | S )]
The Conditional Entropy of the default event is at most as large as the
unconditional Information Entropy of the default event, i.e.
For the empirical default rate :
H p  H ( P( D))   E[ P( D) log P( D)  P( D) log P( D)]
Datamining-Solutions
26
Receiver Operating Characteristic (ROC)
An analytic tool that is closely related to the Cumulative Accuracy Profile is the
Receiver Operating Characteristic (ROC). The ROC can be obtained by plotting the
fraction of defaulters against the fraction of non-defaulters. The two graphs thus
differ in the definition of the x-axis.
A summary statistic of a ROC analysis is the area under the ROC curve (AUC).
Reflecting the fact that the CAP is very similar to the ROC, there is an exact linear
relationship between the accuracy ratio and the area under the curve:
(AR)=2× AUC−1
Datamining-Solutions
27
Conditional entropy, Kullback-Leibler distance
H can be interpreted as a measure of chaotic character.The difference of H(p)
and HS should be as large as possible because in this case the gain of
information by application of the rating scores would be a maximum:
HPS  H ( p)  H s  max
For the normalization case: we get Conditional Information Entropy Ratio (CIER):
H ( p)  H s
CIERs 
H ( p)
The value of CIER will be the closer to one the more information about the
default event is contained in the rating scores S.
Datamining-Solutions
28
Conditional Information Entropy Ratio (CIER)
Matlab file:
CIER.m
Call:
[wsk, HPS]=cier(pd,is_default,spos)
input:
pd– column vector of rarting value (higher value denote more risky)
is_default – 1 denote default, 0 denote non-default
spos – various approaches
Output:
wsk-CIER normalised form of conditional entropy
HPS – unnormalised form of conditional entropy
Datamining-Solutions
29
Conditional Information Entropy Ratio (CIER)
The Kullback Leibler divergence can be coputed as follows:
P( s / def )
D(def nondef)   P( s / def )  log2
P( s / nondef)
s
This expression can be interpreted as a information divergence (information gain,
relative entropy ) between a scoer density for the default and nondefault population.
Becouse of its unsymetrically character its more comfortably to use another form as a
System Stability Index (SSI)
SSI  D(def nondef)  D(nondef def )
of course both of these vaues should be as much as it is possible
Datamining-Solutions
Conditional entropy, Kullback-Leibler distance
Matlab file:
kullback_leibler.m
Call:
[D_DN, D_ND, SSI]=kullback_leibler(pd,is_default,spos)
input:
pd– column vector of rarting value (higher value denote more risky)
is_default – 1 denote default, 0 denote non-default
spos – various approaches
Output:
D_DN – distance from f(D) to f(ND)
D_ND - distance from f(ND) to f(D)
SSI - System Stability Index
Datamining-Solutions
31
Bayesian error rate
Denote with pD the rate of defaulters in the portfolio and define the hit rate HR
and the false alarm rate FAR as above. In case of a concave ROC curve the
Bayesian error rate then can be calculated via
Error rate  min ( pD (1  HR (C ))  (1  pD ) FAR (C ))
c
As a consequence, the error rate is then equivalent to the Pietra Index and the
Kolmogorov-Smirnov statistic.
Datamining-Solutions
32
Bayesian error rate
Matlab file:
bayesian_error_rate.m
Call:
[ber]=bayesian_error_rate(rating,is_default,spos)
input:
rating– column vector of rarting value (higher value denote more risky)
is_default – 1 denote default, 0 denote non-default
spos – various approaches
Output:
ber - Bayesian error rate
Datamining-Solutions
33
Kendall’s τ and Somers’ D
Kendall’s τ and Somers’ D are so-called rank order statistics, and as such
measure the degree of comonotonic dependence of two random variables. The
notion of comonotonic dependence generalises linear dependence that is
expressed via (linear) correlation. In particular, any pair of random variables with
correlation 1 (i.e. any linearly dependent pair of random variables) is
comonotonically dependent. But in addition, as soon as one of the variables can
be expressed as any kind of increasing transformation of the other, the two
variables are comonotonic. In the actuarial literature, comonotonic dependence
is considered the strongest form of dependence of random variables.
Kendall noted that the number of concordances minus the number of
discordances is compared to the total number of pairs, n(n-1)/2, this statistic is
the Kendall's Tau a:
a
C D


[n(n  1) / 2]
Datamining-Solutions
34
Kendall’s τ and Somers’ D
Kendall’s τ and Somers’ D are so-called rank order statistics, and as such
measure the degree of comonotonic dependence of two random variables. The
notion of comonotonic dependence generalises linear dependence that is
expressed via (linear) correlation. In particular, any pair of random variables with
correlation 1 (i.e. any linearly dependent pair of random variables) is
comonotonically dependent. But in addition, as soon as one of the variables can
be expressed as any kind of increasing transformation of the other, the two
variables are comonotonic. In the actuarial literature, comonotonic dependence
is considered the strongest form of dependence of random variables.
Kendall noted that the number of concordances minus the number of
discordances is compared to the total number of pairs, n(n-1)/2, this statistic is
the Kendall's Tau a:
a
and Sommers’D:
C D


[n(n  1) / 2]
C D

SomersD 
[ C   D   T ]
Datamining-Solutions
35
Kendall’s τ and Somers’ D
Matlab file:
tau_somersd.m
Call:
[Tau,SomersD,Tau_a,Gamma]=tau_somersd(pd, is_default)
input:
pd– vector of estimated default from model
is_default - vector of the real pd
Output:
Tau –Tau calculated using Matlab function corr.m
SomersD –value of SomersD
Tau_a – Tau calculate is traditional way
Gamma – the same value as SomersD in case there aren’t the same value
of pd (T)
Datamining-Solutions
36
Brier Score
The Brier score is a method for the evaluation of the quality of the forecast of a
probability. It has its origins in the field of weather forecasts. But it is
straightforward to apply this concept to rating models. The Brier Score is denifed
as RMSE (root mean squared error) N
(d i  PDi ) 2

Brier score  i 1
N
where i indexes the N observations, di is an indicator variable that takes the value
1 if borrower i defaulted (0 otherwise), and PDi is the estimated probability of
default of borrower i.
 the Brier score lies between 0 and 1,
 better default probability forecasts are associated with lower score values.
Datamining-Solutions
37
Brier Score
Matlab file: Brier.m
Call:
[out]=Brier(PDs, defaults)
input:
PDs – vector of estimated default from model
default - vector of the real pd
Output:
out –Brier Score
Datamining-Solutions
38
Calibration methods
Datamining-Solutions
39
Calibration methods
Accrording to Basel Committee on Banking Supervision we can mention the
following methodologies assessing the quality of the PD estimates:
 Binomial test,
 Normal test,
 Normal test with asset correlation,
 Traffic lights approach,
 Chi-square test (Hosmer-Lemeshow ).
Datamining-Solutions
40
Binomial test
In many rating systems used by financial institutions, obligors are grouped into
rating categories. The default probability of a rating category can then be
estimated in different ways.
Regardless of the way in which a default probability for a rating grade was
estimated, we may want to test whether it is in line with observed default rates.
From the perspective of risk management and supervisors, it is often crucial to
detect whether default probability estimates are too low.
On the start we can assume that defaults are independent (so default correlation
is zero). The number of defaults Dkt in a given year t and grade k then follows
a binomial distribution. The number of trials is Nkt, the number of obligors in grade
k at the start of the year t; the success probability is PDkt , the default probability
estimated at the start of year t.
Datamining-Solutions
41
Binomial test
At a significance level of (e.g. α =1%), we can reject the hypothesis that the
default probability is not underestimated if:
1  Binom( Dkt 1, Nkt , PDkt )  
where BINOM(x, N, q) denotes the binomial probability of observing x successes
out of N trials with success probability q. If above condition is true, we need to
assume an unlikely scenario to explain the actual default count Dkt (or a higher
one). This would lead us to conclude that the PD has underestimated the true
default probability.
Datamining-Solutions
42
Normal test
For large N, the binomial distribution converges to the normal, so we can also
use a normal approximation to equation from the previous slide.
If defaults follow a binomial distribution with default probability PDkt, the default
count Dkt has a standard deviation:
  PDkt (1  PDkt ) N kt
The default count’s mean is:
  PDkt N kt
Instead of equation using bonomial distribution we can now examine:
 ( Dkt  0.5  PDkt N kt ) 
 
1  
 PD (1  PD ) N 
kt
kt
kt 

Where Φ denotes the cumulative standard normal distribution.
If above condition is true, we need to assume an unlikely scenario to explain the
actual default count Dkt (or a higher one). This would lead us to conclude that
the PD has underestimated the true default probability.
Datamining-Solutions
43
Normal test with asset correlation
When we can assume that defaults are not independent (so default correlation is
not zero) we have to introduce a asset correlation ρ.
Now we examine the following equation using asset correlation ρ:
  1 ( PDkt )  1     1 ( Dkt / N kt ) 
 





1
Where  - is inverse of the normal cumulative distribution function.
If above equation is true, we conclude that the PD estimate was too low with the
asset correlation ρ.
Datamining-Solutions
44
Normal test with asset correlation
Matlab file: Binomial.m
Call:
[ALLTest]=Binomial(PD_est,PD,N,alpha,p)
input:
PD_est - Historically default rates e.g. for year 1990-2004, for
every rating categories (grade) -The default probability of a rating
category
PD - probability of default of every grade rating
N - amount of all trials in every grade rating
alpha - significance level e.g. alpha=1%
p - asset correlation e.g. p=0.07
Output:
ALLTest – matrix of results
first column- binomial
second column – normal
third column – normal with correlation p
------------------------------------------------------------------------------------------------------------Matlab file data example: binDataExample.mat
Datamining-Solutions
45
Traffic lights approach
Decisions on significance levels are somewhat arbitrary. In a traffic lights
approach, we choose two rather or more than one significance level.
If the p-value of a test is below red, we assign an observation to the red zone,
meaning that an underestimation of the default probability is very likely. If the pvalue is above red but below orange, we interpret the result as a very important
warning that the PD might be an underestimate (orange zone). If the p-value is
above orange but below yellow, we interpret the result as a warning that the PD
might be an underestimate (yellow zone). Otherwise, we assign it to the green
zone.
For example we assume the following significance level for trafficlight approach:
red <=0.01 , orange (0.01,0.05>, yellow (0.05,0.07> , green >0.7
Datamining-Solutions
46
Normal test with asset correlation
Matlab file: TrafficLight.m
Call:
[ALLTraficResult]=TrafficLight(PD_est,PD,N,p)
input:
PD_est - Historically default rates e.g. for year 1990-2004, for
every rating categories (grade) -The default probability of a rating
category
PD - probability of default of every grade rating
N - amount of all trials in every grade rating
p - asset correlation e.g. p=0.07
In the file exists significance level for trafficlight:
red <=0.01 , yellow (0.01,0.05>, orange (0.05,0.07> , green >0.7
Output:
ALLTraficResult– matrix of results
Number „4” denotes- red light, Number „3” denotes- orange light, Number „2” denotesyellow light, Number „1” denotes - green light
first column - binomial, second column – normal, third column – normal with correlation
---------------------------------------------------------------------------------------------------Matlab file data example: binDataExample.mat
Datamining-Solutions
47
Chi-square test (Hosmer-Lemeshow)
Let 0, , p … pK denote the forecasted default probabilities of debtors in the rating
categories 0,1,…,k. Define the statistic
k
Tk  
i 0
(ni pi  i )
ni pi (1  pi )
with ni = number of debtors with rating i and θi = number of defaulted debtors
with rating i. By the central limit theorem, when ni → ∞ simultaneously for all i, the
distribution of Tk will converge in distribution towards a χ2 k +1 -distribution if all the
pi are the true default probabilities.
The p-value of a χ2 k +1 -test could serve as a measure of the accuracy of the
estimated default probabilities: the closer the p-value is to zero, the worse the
estimation is.
Datamining-Solutions
48
Chi-square test (Hosmer-Lemeshow)
Matlab file:
hosmer_lemeshow.m
Call:
p=hosmer_lemeshow(pd,is_default)
input:
pd – vector of estimated default from model
is_default- vector of the real pd
Output:
The p-value of a χ2k +1
Datamining-Solutions
49
Download