A Data Mining Approach to Credit Risk Evaluation and Behaviour Scoring Sara C. Madeira1,2 , Arlindo L. Oliveira1,3 , and Catarina S. Conceição3 1 Inesc-ID/IST, 1049-001 Lisbon, Portugal, aml@inesc-id.pt 2 Universidade da Beira Interior, 6200-001 Covilhã, Portugal, smadeira@di.ubi.pt 3 Link Consulting SA, 1000-138 Lisbon, Portugal, catarina.conceicao@link.pt Abstract. Behaviour scoring is used in several companies to score the customers according to credit risk by analyzing historical data about their past behaviour. In this paper we describe a data mining approach to credit risk evaluation in a Portuguese telecommunication company. 1 Introduction Mobile telecommunications companies need to evaluate the credit risk of their current customers and of potential new customers. Before accepting a new customer, or in order to re-calculate the credit limit of an existing customer, it is necessary to estimate his risk class, and classify him in one of the potential risk classes. This scoring process is largely based on scorecards [6], obtained using non exact models and specific knowledge from business, and whose doubtful precision can lead to a high number of classification problems. The company where this project was developed was not an exception in what relates to the use of scorecards: a scorecard was used to determine the credit risk of potential customers. Each potential customer was scored using a scorecard, classified in one of several risk classes, and then assigned a reference credit limit, and an initial customer segment (“Low”, “Medium” or “High”). Every six months, the customers’ risk class, and consequently their customer segment and credit limit were re-calculated. This is when behaviour scoring begins. Every customer with a sufficient number of invoices was analyzed using the last N invoices and the delays observed in their payments. The average payment delay of each customer calculated using a weighted average of the observed delays in the payment of the invoices considered, was used as the basis for this re-classification process. Several criticisms were made by the business experts to the existing credit risk evaluation system. The main two were related with the lack of possibility to anticipate the risk and the high number of invoices needed to re-classify accurately the customers. Facing this scenario, the use of standard machine learning techniques came out as a innovative and credible alternative to perform behaviour scoring without using a behaviour scorecard. Multiple logistic regression, decision trees and Fernando Moura Pires, Salvador Abreu (Eds.): EPIA 2003, LNAI 2902, pp. 184–188, 2003. c Springer-Verlag Berlin Heidelberg 2003 A Data Mining Approach to Credit Risk Evaluation and Behaviour Scoring 185 neural networks were used to test the potential explanatory value of hundreds of variables, define the target concept of this real world machine learning problem, and, finally construct the inference models needed to implement a credit risk evaluation and behaviour scoring approach based on Data Mining [2]. 2 Inference Models: Time Window and Target Concept The models derived should infer the customer segment based on historical data, and enable the anticipation of the risk class three months in the future using six months of historical data. In order to do this, nine months of historical data should ideally be used to derive the models: the training examples should be obtained using six months of data to characterize the customers’ past behaviour, while the remaining three months should be used to calculate the customer segment three months after the eyeball. The eyeball is defined as the date when the inference model is used to predict the customer segment (see Fig. 1). In order to enable the re-classification of recent customers, the model set should be constructed using examples of customers with a minimum of four invoices. Assuming that the last three months of historical data are used to simulate the customers’ future behaviour, and consequently, their future risk, the customer segment can be re-calculated after the due date of the customers’ first invoice, without having to wait six months to re-evaluate their credit risk. Fig. 1. Time Window Inferring the customer segment is a concept learning task [1], where the concept is a three-value function defined over all customers: H when the customer segment is “High”, M when the customer segment is “Medium” and L when the customer segment is “Low”. The learning problem can be defined as follows: let {X1 , . . . , XN } be a set of N data objects, which can be represented as a N × n data matrix, X , where n is the number of attributes used to describe each instance Xk . This means that X is the set of instances over which the concept is defined and each customer is an instance represented by the vector Xk = (x1 , . . . , xn ), xi ∈ Di and Di is the domain of the attribute xi , which can be real, in the case of real-valued attributes, or discrete, in the case of nominal attributes. The target concept, denoted by c, is in this case a function defined over the set of instances X that corresponds to the customer segment: c : X → {H, M, L} (1) 186 3 S.C. Madeira, A.L. Oliveira, and C.S. Conceição Using a Proxy to Obtain the Target Concept Most data mining techniques, and particularly the ones we intended to use, accept a set of training examples (ordered pairs < Xk , c(Xk ) >), each consisting of an instance Xk from X and its target concept value c(Xk ). However, the business experts did not provide us with a set of examples of customers already classified as “Low”, “Medium” or “High” from whose behaviour and history the machine learning algorithms could learn. They could only identify the maximum days delay observed in the payment of the customers’ last invoices as a very informative attribute about the customers’ probability of default. This value could easily quantify the credit risk of a given customer and his probability to suffer extreme dunning actions, like deactivation. Having this in mind, we decided to study the relation between the maximum payment delay observed in the payment of the last nine invoices of the customer and his probability of default, which is, in this case, a synonym of probability of dunning deactivation. The approach used to compute the customer segment was based on the following assumption: a customer whose probability of default is greater than the profit margin of the company should definitely be classified as a “Bad” customer, since he/she will, on the average, be a liability for the company. Assuming this, a statistical study was made in order to find out which maximum value of payment delay implied a probability of default of approximately 65%, the estimated average margin. The entire population of customers was analyzed in order to find out the probability of default of a customer three months in the future, given his maximum payment delay observed to date. The probability of default, pd, associated with a given value of maximum payment delay, mpd, was computed as follows: pd = B C × 100 (2) where B is the number of dunning deactivations observed in the future for the group of customers whose maximum payment delay was greater than mpd at time t, and C is the number of customers with maximum payment delay greater than mpd at time t. The distinction between the segments “High” and “Medium” was also made using the probability of default. During the previous statistical study we noticed that below a certain number of maximum days delay, the probability of default did not change, and was for this reason independent from the maximum days delay. This means that the probability of default of a customer who has always paid in time is in fact as high as that of the customers whose maximum payment delay has never exceed a certain number of days. The value of the probability of default of a customer of the segment “High” was set to approximately 2%, and consequently the segment “Medium” included all customers whose probability of default was between 2% and 65%. The probability of default was estimated from the maximum delay observed in the payment history of the customer. A Data Mining Approach to Credit Risk Evaluation and Behaviour Scoring 4 187 A Two Level Approach to Inference of Models In the information system of the company, the data was organized in several levels. The top two were legal entity and payment responsible. Each legal entity can have several payment responsible. The great majority of data, and the potentially explanatory variables were concentrated at the payment responsible level. This data included all the historical data related to the customers’ payment behaviour. However, the business experts were also interested in evaluating the customer risk at the top level data: legal entity. Facing the fact that aggregating the existing data from the level payment responsible to the top level would not be optimal in terms of model precision, it was decided to use a two level approach to the inference of models. Four models were derived at the payment responsible level, one for each customer class considered. After deriving the models at the payment responsible level, the predicted customer segments of each payment responsible are used together with other attributes found relevant at the legal entity level to construct another data set that was labelled by a business expert. This data set was then used to derive a model at the legal entity level, as shown in Fig. 2. Fig. 2. Two Level Models. 5 Analyzing the Precision of the Inference Models Several models were derived at the payment responsible level using three well known data mining techniques: multiple logistic regression, decision trees [3,4] and neural networks [5]. Regression models and neural networks were not competitive with decision trees in what concerns to precision. Furthermore, deriving human interpretable models was preferable, and we could easily obtain them from the decision trees. The derived decision trees had between 10 and 60 nodes and could for this reason be easily converted into understandable if-then rules. Assuming that the labels computed for each instance (see Sect.3) model exactly the credit risk of the customers, it is interesting to compare the performance of the data mining approach at the payment responsible level with the base segmentation model, used previously. The base segmentation model classifies a customer in accordance with the maximum payment delay observed until 188 S.C. Madeira, A.L. Oliveira, and C.S. Conceição Table 1. Confusion Matrices: Base Segmentation Model and Decision Trees. % classified as → Low Medium High % classified as → Low Medium High Low 41.10 3.60 0.16 Low 35.41 0.56 0.00 Medium 8.24 10.84 6.47 Medium 5.24 22.14 1.16 High 1.82 4.81 22.96 High 0.54 2.66 32.29 the eyeball. In order to perform this comparison we computed the differences between the classification obtained by the base segmentation model and the true customer segments observed three months later. Table 1 shows that the total error of the base segmentation model was 25.10% in the test set, compared with the 10.16% of the decision trees. This represents a positive gain of 15% in the precision of the behaviour scoring approach. These results translate into increased precision at the legal entity level, not reported here for lack of space. 6 Conclusions We presented an approach that uses machine learning techniques to perform behaviour scoring and infer the credit risk of the customers. Predictive models were trained to infer the credit risk of the customers three months in the future given six months of historical data. The final models were derived using decision trees, which were chosen for their precision and human interpretability. The capability of anticipating the customer segmentation three months in the future gives the business experts the possibility to act in advance, by revising the credit limits in order to decrease substantially the probability of default of the customers. The two level approach followed gives the company two customer segments, one for the payment responsible and another for their legal entity, enabling the flexibility to act at the level most adapted to the specific situation. References 1. Mitchel, T. M.: Machine Learning. McGraw-Hill Internacional Editions, Computer Science Series. Singapore (1997) 2. Han, J., Kamber, M.: Data Mining. Concepts and Techniques. Morgan Kaufman Publishers, San Francisco, U.S.A. (2001) 3. Breiman, L., Friedman, J.H., Olsen, R. A., Stone, C. J.: Classification and Regression Trees. Pacific Grove, Wadsworth (1984) 4. Quinlan, J. R.: Induction of decision trees. Machine Learning, 1:81–106 (1986) 5. Rumelhart, D.E., McClelland, J.L., PDP Research Group: Parallel Distributed Processing, MIT Press, Cambridge (1986) 6. Banks, W.J., Leonard, K.J:Credit Scoring and mathematical models. Credit and Financial Managment Review, Volume 1 (1995) View publication stats