Credit Evaluation

advertisement
Stat503
Dr. Cook
Enhua Ma
Spring, 1999
Credit Evaluation
 Course Project
Credit evaluation is concerned with the process of assigning a score to an existing or prospective
loan based on the characteristics of the applicant. In general, financial ratios are widely used to
evaluate a client’s credit level. Eight key ratios, which are mainly related to the risk analysis, are
used in our study to evaluate a client’s credit level.
Eight measures are available.
Group: Dummy variable (1: Good Credit. 2: Bad Credit)
X1: The value of current assets-liabilities ratio (C. ASSET / C. LIABILITIES)
X2: The proportion of w. capital in total assets (W. CAPITAL / T. ASSETS)
X3: The ratio of current liabilities and total assets (C. LIABILITIES / T. ASSETS)
X4: The proportion of current assets in total assets (C. ASSETS / T. ASSETS)
X5: The ratio of total liabilities and total assets (T. LIABILITIES / T. ASSETS)
X6: The ratio of net worth change and total assets (CH. NETWORTH / T. ASSETS)
X7: The ratio of farm land value and total assets (FARMLAND VALUE / T. ASSETS)
X8: The ratio of total liabilities and net worth (T. LIABILITIES / NETWORTH)
The primary questions are
1. How do we distinguish the client's financial credits according to the combinations of their
assets and liability information?
2. Can we do a good prediction job according to these classification rules?
1
2.
Suggested Approaches
Approach
Data restructuring
Variable
transformations are
needed.
Reason
For some of the ratios, there are
high skewness and kurtosis.
Normality, variances, and outlier
assessments show that data
restructuring are needed.
Create different
To allow us to inspect the group
colors and glyphs for difference visually.
different groups
Summary Statistics To obtain the basic numerical
information about the variables
Visual Inspection
( under XGobi)
Numerical Analysis
Type of questions addressed
"What are the basic structures of the
variables?"
Box-Plots
"Which variables might be useful
discriminators of the client's credit?"
Bivariate Plots
"Which pairs of variables can do a
better job in client's credit
classification?"
Rotation and Grand Tour
"What kinds of combinations of
variables might be useful for the
two-group classification?"
 "Can We find a good
classification rule for the bank
credits based on the information
provided?"
 "What are the advantages and
disadvantages of statistical
methods and machine learning
methods and how about their
predictive power?”
Linear Discriminant Analysis
Classification and Regression
Trees
Feed-Forward Neural Network
Backpropagation Net (BPN)
Self-Organizing Maps (SOM)
2
3. Actual Approaches
3.1 Data Reconstructing
In order to satisfy the linear discriminant analysis assumptions, variables are reconstructed as
following.
X1  ln(X1)
X3  X3^0.5
X6  ln(X6 +1)
X2  ln(X2 +1)
X5  ln (X5)
X8  X8^0.25
The univariate plots which are used to inspect the distributions and the transformations are
presented in Appendix 1.
3.2 Summary Statistics for transferred measurements
Table 1. Overall summaries of variables for training and testing group
Number = 68, Number of Variables = 8
____________________________________________________________________________
X1
X2
X3
X4
X5
X6
X7
X8
Training
Mean
Std.Dev.
Mini.
Ist Qu.
Median
3rd Qu.
Maxi.
Skewness
Kurtosis
0.67
0.95
-0.93
0.06
0.50
1.22
3.51
1.09
1.19
0.08
0.11
-0.21
0.01
0.07
0.14
0.36
0.26
0.28
0.34
0.14
0.00
0.26
0.34
0.42
0.65
-0.02
0.04
0.40
0.18
0.02
0.28
0.40
0.53
0.83
0.07
-0.42
-1.64
0.55
-3.00
-1.97
-1.71
-1.20
-0.58
-0.12
-0.13
0.04
0.09
-0.17
0.00
0.00
0.10
0.47
1.61
5.84
0.47
0.24
0.00
0.36
0.52
0.64
0.82
-0.75
-0.31
Table 2. Variable summary statistics by groups
_____________________________________________________________
Group1(n=34)
Group2(n=34)
Training
Mini. Maxi. Mean Std. Dev. Mini. Maxi. Mean
Std.
Dev.
0.07
3.51
1.25 0.93
-0.93
1.39
0.09
0.53
X1
0.01
0.36
0.14 0.09
-0.21
0.20
0.01
0.08
X2
0.00
0.65
0.28 0.14
0.17
0.65
0.40
0.12
X3
0.02
0.83
0.32 0.18
0.28
0.75
0.49
0.12
X4
-2.41
-0.58
-1.47 0.51
-3.00
-0.78
-1.80
0.54
X5
-0.17
0.47
0.06 0.11
-0.11
0.20
0.02
0.07
X6
0.00
0.82
0.46 0.23
0.00
0.79
0.48
0.25
X7
0.38
1.47
0.82
0.21
0.79
1.31
0.99
0.13
X8
3
0.91
0.19
0.38
0.79
0.90
1.03
1.47
0.15
0.79
3.3 Bivariate Scatterplot
Figure 1. Bivarite Scatterplot (red: group 2, green: group1)
There is a strong positive correlation between X4 and X8. It looks like that X7 and X8
contribute little in discriminate the two group, while the combination of X1, X2, and other
variables provide good distinction between the two groups.
The scatter plot was made in Xgobi.new (Swayne, Cook & Buja, 1998).
3.4 High dimensional visual inspection
Distinguishing the two groups by visual inspection  Grand tour and 3-d rotation does not give
a good separation on this data set.
4
3.5 Linear Discriminant Analysis (LDA)
Figure 2. LDA result for training set
Figure 3. LDA result for testing set
(green for group 1, red for group2)
The linear discriminant analysis solution is given by:
A = eigenvectors of W-1B, X1bar and X2bar are the means of the group 1 and group 2,
respectively. Xbar is the overall mean. Since the prior probabilities are the same for both group,
they are omitted.
X1barT = [1.2476 0.1448 0.2838 0.3179 -1.4689 0.0606 0.4597 0.8201]
X2barT = [0.0929 0.0076 0.4035 0.4903 -1.8018 0.0231 0.4797 0.9910]
XbarT = [0.6703 0.0762 0.3437 0.4041 -1.6354 0.0419 0.4697 0.9056]
A =
-0.9500600
-11.1621542
-5.8750227
10.1964130
0.6896711
1.2087291
-2.8772790
-7.2737290
The classification rule is that X0 belongs to group 1 if
(X1bar - Xbar)TAAT(X0 - Xbar)-(X1bar - Xbar)TAAT(X1ba - Xbar)
(X2bar - Xbar)TAAT(X0 - Xbar)-(X1bar - Xbar)TAAT(X2ba - Xbar);
X0 belongs to group 2 otherwise. Where X0 is the case need to be classified.
5
Linear discriminant analysis misclassifies 8/68 or 12 percent of the observations for the training
set, which is not very satisfied. Using the classification rule we got to classify the testing set, the
next year's data, we got the same misclassification error rate. Four in each group are
misclassified. Figure 1 and Figure 2 showed the corresponding LDA result.
3.2 Classification and Regression Tree (CART)
Figure 1
Classification tree:
tree(formula = loans2.group ~ ., data = loans2d)
Variables actually used in tree construction:
[1] "x1" "x6" "x4" "x2"
Number of terminal nodes: 7
Residual mean deviance: 0.4568 = 27.86 / 61
Misclassification error rate: 0.1029 = 7 / 68
x1, x2, x4, and x6 are actually used for CART classification. The misclassification error rate is
10 percent, which is a little better than LDA. The CART classification rule is following
If X1 < 0.06 or 0.06< X1 < 0.32 and X6< 0.00990131 or X1>0.32 and X4 >0.3 and
0.135395 > X2 > 0.0615694, then the case is in group2 (bad credit).
Otherwise, the case is in group 1.
6
When the data in the third year were applied as a testing set, four points are clearly misclassified.
Other 8 observations are given equal weight for group1 and group2. They are observation 5, 8,
28, 32, 47, 61, 64, 68. Further examination of these eight units are required.
> predict
1
2
1
28
2
2
2
28
3.1 Feed-forward Neural Network
We use 8-3-1 skip layer connections with 39 weights and linear output units with decay equal
0.001 for this data set. The prediction function is stored in loans2.nn. Feed-forward Neural
Network gives us almost perfect result, only 1 case is misclassified. This classification probably
has a highly non-linear boundary. Visual inspection didn’t show any possibility of getting a
linear perfect classification. When loans2.nn are used to classified the data set for the next year,
it does an excellent job too. This may be caused by the similarity of the data set.
> table(d.loans2[,2],round(predict(loans2.nn,dloans2)))
1 2
1 33 1
2 0 34
>d.loans3_read.table("loans3.dat",col.names=c("ob","group","x1","x2","x3","x4","x5","x6","x7","x8"))
> loans3d_data.frame(d.loans3[,3:10])
> table(d.loans3[,2],round(predict(loans2.nn,loans3d)))
1 2
1 33 1
2 0 34
3.4 Back Propagation (BP) Nets
A software called PCNeuron is used to do the Back Propagation and SOM analysis. This is a
neural network development shell developed by professor I-Cheng Yeh at the Chung-Hwa
Institute, Taiwan.
BP is one of the most popular neural networks used in selected business applications. It is
widely used in stock price prediction, bank loan evaluation, bankruptcy prediction, and other
classification problems. P. Werbos initiated the early framework of BP in 1974, in which he
proposed a method of modifying connection weights for the neurons in the hidden layers. In
1985, D.Parker and Rumelhart, D., Hinton, G. E., and William, based on independent studies,
simultaneously revealed the concepts and computational procedure of BP.
BP uses a simple gradient steepest descent algorithm, searching for minim on a specified error
surface, using small steps of fixed size (specified by the learning rate). The algorithm is
following:
7
1.
2.
3.
4.
Determine the network structure, system’s parameters, and initial connection weights.
Select a training pair from the training set.
Apply the input vector to the network.
Calculate the output of the network
Sj =  ai Wji + a0 Wj0 ; ai = f( Sj )
Sk =  aj Wkj ; ak = f( Sk )
5. Calculate the error between the calculated output and the desired output (backward):
ek = ( tk - ak ) f( Sk )
6. Repeat steps 1-5 for each training pair until the error of the entire set is acceptably low.
For this project, one hidden layer BP is used. There are 8 input units, 4 hidden units, and 2
output units.
Figure 4: The Structure of BPN Net
Connecting Weights
Input layer
[7.360
Hidden layer
[-6.888
Output layer
[-0.795
3.814
8.619
4.325
-8.152
-2.088
14.106
-0.111
14.610
-0.485]
12.164]
0.795]
There are 4 out of 68 or 6 percent of the cases were misclassified for both training and testing
set. This is a pretty good result, although it is not as good as the feed-forward neural network.
When putting the prediction back into the data set, it looks that X1, X2, X5, and X7 contribute
most for the discrimination.
Self-Organizing Map (SOM)
Self-organizing map (SOM) network was developed by Teuvo Kohonen between 1979 and 1982.
It can be used to classify items into appropriate categories of similar objects. The SOM net was
inspired by the fact that the relative positions of small groups of neurons in the brain reflect some
physical relationship to the sensory signals. SOM net is one of the most popular unsupervised
learning nets. It not only can work stand-alone, but also can serve as a front-end for other
8
networks. The primary use of the SOM is to visualize topologies and hierarchical structures of
higher dimensional input spaces.
The structure of SOM net contains two layers. The first layer, input layer, uses a linear transfer
function. It input signal can be binary or continuous variables. The second layer, a competitive
layer, is used to represent clusters of the input data. The two layers are fully connected to each
other.
SOM net used the “winner-tale-all” strategy to update the connection weights. That is, for each
iteration, only the output neuron that wins the competition by being closest to the input vector is
activated and allows modifying its connection weights. The learning procedure works as
follows:
1. Determine the network’s parameters and initialize the weights Wkj between input neuron j
and output neuron k to small random values.
2. Present a new input vector x and compute the distance between the input vector and each
output neuron k:
n
Outk =  (Xij - Xkj)2 , k= 1,2,…,c.
j=1
3. Select the output neuron k which has the smallest distance:
Out k* =min (Out K)
k
4. Update the activation values of output neurons based upon the winner-take-all strategy:
Yk = 1 if k = k*; otherwise Yk = 0
5. Modify the connection weights for all units k within a specified neighborhood of k*
n
Wkj’ = Wkj +  (Xij - Xkj)2 , kNBD k*(t); 1 j n.
j=1
6. Update the larning rate 
7. Repeat the above steps until no more input vectors.
Figure 5: The Basic Structure of SOM Net
9
For this data set, 2 by 2, 3 by 3, and 4 by 4 grids are used. The result maps are in figure 5. We
can see that SOM did a poor job. Almost half of the points are misclassified. Although none of
the three maps gives us a good result, the more the output units, the better the data structure is
described.
Figure 5 SOM results (red for group2, green for group 1)
10
4. Summary of Findings
1. For the credit evaluation problem, neural network methods are generally perform better than
statistical methods. Feed-forward neural net gives us a nearly perfect result. Given the
highly non-linear boundary obtained by neural net methods, I recommend that CART rule
and Feed-forward rule be used together to get better prediction results. The CART
classification rule is:
If X1 < 0.06 or 0.06< X1 < 0.32 and X6< 0.00990131 or
X1>0.32 and X4 >0.3 and 0.135395 > X2 > 0.0615694, then the case is in group2 (bad credit).
Otherwise, the case is in group 1.
The discriminant rule for Feed-forward method is generated by S-Plus function and stored in
loans2.nn to classify new points.
2. We can distinguish very well the client’s financial credits according to the combinations of
their assets and liability information provided in this data set. Since the cost of classifying a
customer with good credit to bad credit group is much less than classify a customer with a
bad credit to the good credit group, the sheer number or rate of misclassification should not
be the only criterion to judge different classification rules. The direction of misclassification
matters in such situation.
3. Among the measures, X1, X2, X4, and X6 contribute most in classifying the two groups.
4. There are several points have much higher asset/liabilities ratio. They are 35,39,47,49, and
52. Data inspection shows that they are not outliers. They are valid data points
corresponding to conservative customers with low risks.
11
Reference
Kohonen, T. 1990. “The self-organizing Map”. Proceedings of IEEE. V.78(9):1471-1480
Ripley, B. 1996. Pattern Recognition and Neural Networks. Cambridge University Press,
Cambridge, U.K
Venables, W. N. and Ripley, B. 1994. Modern Applied Statistics with S-Plus. Springer-Verlag,
New York
12
Download