A systematic credit scoring model based on heterogeneous classifier ensembles Maher Ala’raj Maysam Abbod Department of Electronic and Computer Engineering Brunel University London Uxbridge, UK maher.ala’raj@brunel.ac.uk Department of Electronic and Computer Engineering Brunel University London Uxbridge, UK maysam.abbod@brunel.ac.uk Abstract— lending loans to borrowers is considered one of the main profit sources for banks and financial institutions. Thus, careful assessment and evaluation should be taken when deciding to grant credit to potential borrowers. With the rapid growth of credit industry and the massive volume of financial data, developing effective credit scoring models is very crucial. The literature in this area is very dense with models that aim to get the best predictive performance. Recent studies stressed on using ensemble models or multiple classifiers over single ones to solve credit scoring problems. Therefore, this study propose to develop and introduce a systematic credit scoring model based on homogenous and heterogeneous classifier ensembles based on three state-of-the art classifiers: logistic regression (LR), artificial neural network (ANN) and support vector machines (SVM). Results revealed that heterogeneous classifier ensembles gives better predictive performance than homogenous and single classifiers in terms of average accuracy. Keywords—credit scoring; LR; ANN; SVM; homogenous ensembles; heterogeneous ensembles; bagging; majority voting I. INTRODUCTION Credit granting is considered a key business activity that generates profits to banks and financial institutions. However, it can also be a great source of risk. The recent financial crises resulted in huge loses globally and increased the attention of banks and financial institutions on credit risk. Therefore, Basel Committee for banking supervision [1], requested all banks and financial institutions that it should implement powerful credit scoring systems to help them in estimating their credit risk levels and different risk exposures, as well as improving capital allocation and credit pricing. As a result, it has become very important for every bank and financial institution to have rigorous evaluation procedures when it comes in making a credit decision to either grant a loan to an applicant or not [2]. Because of the huge amount of consumer data available, banks and financial institutions have currently a need for advanced analytical and systematic tools that support their credit managerial support decisions in order to comply with the Basel regulatory requirements. Credit scoring systems are models aligned with assessing financial risks associated with granting credit for potential applicants [3]. Credit scoring models are developed to classify and assign future loan applicants in to two groups either ‘good’ or ‘bad’ with respect to their characteristics such as age, income and marital condition [4], [5]. In the light of the applicants score that is generated from the model, good or accepted applicants are expected to repay their obligation over the loan period while bad or rejected applicants are expected to fail on their obligation over the same loan period. To build a credit scoring system, data or loans of past granted applicants are collected in order to capture patterns and relationships between the characteristics of the applicants and their known loan status and to identify which are best to distinguish between good and bad applicants [6], [7]. Generally speaking, credit scoring models are used to estimate or predict the new applicant’s credit risk [8] or their probability of default [3] using past loan applicants information. Credit scoring models are typically built using traditional or statistical methods such LR [9], [10], [11] and data mining or machine learning techniques, such as ANN [11], [12], [13], [14], [15], [16], SVM [11], [17], [18], [19], [20]. It is concluded in [9], [21] that data mining techniques are superior to traditional methods in dealing with credit scoring problems, especially for nonlinear relationships between variables and for its generalization abilities. Moreover, Baesens et al., reflected the need of banks and financial institutions in investigating new advanced methods in order to improve their credit granting decisions [11]. Despite of the encouraging results achieved by related studies, and its ability to hold classification problems, there was a direction by academic and practitioners for having more developed models with improved accuracy. However, they have been developing scoring models based on new advanced techniques, such hybrid and ensemble techniques which show superiority over single models [22], [23]. Hybrid methods are mainly based on combining two different methods together. The first method will be responsible for per-processing training data and then the second method will learn from the pre-processed data to output decision result [22]. While ensemble methods are based on training multiple or number of classifiers to solve the same problem, and then their outputs are combined or aggregated by a method(s) into one classifier to give an enhanced single output [23]. According to Lessmanna et al., ensemble classifiers models training can be achieved in two ways [3]: Homogenous ensembles: create multiple base models using the same classification algorithm and combine their prediction in to one single classifier. Heterogeneous ensembles: create base models using different types of classification algorithms and combine their prediction in to one single classifier. In this paper a mixture of individual classifiers, homogenous and heterogeneous classifier ensembles will be built based on three mostly used classification techniques in the literature namely LR, ANN and SVM. However, the systematic proposed model is made up of following phases: Collect the datasets. Splitting data in training, and testing sets. Choose and generate the classification algorithms. Combine the trained classifiers in to an ensemble. Assess the model using performance evaluation measures. In practice, the results of the proposed ensemble classifier will be compared against the result of each individual classifier to prove the superiority of proposed method. Moreover, the performance of proposed model will be compared to individual classifiers and homogenous and heterogonous classifiers ensembles independently. In conclusion there is no overall best classification technique used in building credit scoring model, selecting a model that could classify between classes depends on the nature or the domain of the problem, data structure, variables used and the market or the environment it will be developed in [24], [25], [26] The rest of the paper is organized as follows: section 2 reviews some related literature on ensemble techniques in credit scoring. Section 3 represents the research methodology. Section 4 presents the experimental setup of the experiments. Section 5 demonstrates experimental work. Section 6 carries on discussion and analysis of the results. Finally, section 6 draws the conclusion and future direction of research. II. RELATED LITERATURE Building well-organized and efficient credit scoring models is demanded and it is quite a challenging task, due to the massive growth of credit industry. The number of studies that have been developing credit scoring models is increasing massively. Their research incentive is to improve models that maximize profits, minimize risks and support efficient and reliable decision making. The research tendency is going towards hybrid and ensemble modeling, because of its efficiency in handling the credit scoring problems. As a result, models are getting more complex, besides to higher costs of implementation [2]. Due to the huge volume of current literature in the field of credit scoring, we will focus only on the studies that implemented ensemble classifiers in chronological order. West et al., carried out a study using ANN ensemble over 3 methods (cross-validation, bagging and boosting) to investigate their performance, accuracy and how it reduce the generalization error of ANN models, ANN ensembles showed more accurate results and low generalization error than the single best ANN classifier [27]. Cross -validation and bagging were better than boosting in testing generalization error. Tsai and Wu in their study compared single ANN classifier with multiple and diversified classifiers in terms of accuracy and misclassification cost. Multiple classifiers showed better performance only in one data set [28]. In [29] Nanni and Lumini investigated the use of ANN Levenberg-Marquardt, SVM, and k-nn as base classifiers, and then use them as ensemble classifiers with random subspace, bagging and boosting methods. Results showed superiority of ensemble in terms of classification performance. Yu et al. [30] constructed multi-stage ANN ensemble classifiers based on bagging sampling method, the ensemble method showed better results than SVM and LR. Zhang et al., proposed new vertical bagging ensemble decision tress technique based on attribute reduction, and then used the reduced data for classification [31]. Zhou et al., in [32] developed ensemble models based on LS-SVM in order to reduce bias and improve classification accuracy. Hsieh and Hung built a complete efficient ensemble classifier based on hybrid clustering to reduce data, discretization to rank data, and screen useless ones then use the reduced data to fit the ensemble models of ANN, SVM and Bayesian networks, the classifiers are combined using confidence-weighted voting [33]. Twala [34] investigated ANN, Bayesian networks, decision tress, k-nn and LR classifiers with different noise levels of attributes, and try to improve the accuracy their ensembles, results revealed that ensemble classifiers are more accurate at different noise levels. Yu et al., in another study used ensemble SVM classifiers to increase performance over single SVM classifier [35]. The work in [26] tested LR, decision tress, NN, SVM as single classifiers and ensemble classifiers over ensemble methods bagging, boosting and stacking. Ensemble methods showed improvement over single classifiers. Finlay in [36] empirically investigated several multiple classifiers system using bagging, boosting, new boosting method called Error Trimmed boosting. In their study [37] use d dual ensemble strategies based on bagging and random subspace with decision trees and compare its performance with other ensemble methods and classification techniques. Marques et al [38] explored the behavior of several base classifiers such as ANN, SVM, LR when used as classifier ensembles, several ensemble methods were used to construct them, the aim of the study is to to suggest appropriate classifiers for each ensemble method used. Again, Marques et al in another study [39] proposed a new two-level ensemble for credit scoring by combing different ensembles to obtain better accuracy, both bagging and boosting were used for data resampling and both random subspace and rotation forest for attribute resampling, results were better than traditional single ensembles and individual classifiers. Tsai developed a novel hybrid ensemble classifiers based on clustering, homogenous and heterogeneous ensembles using ANN, LR and decision tress [23]. Results showed best prediction results when using hybrid with homogenous and heterogeneous ensembles classifiers with weighted voting combination. The above summarized studies focused on using ensemble classifiers in order to achieve high predictive accuracy rate. However, it is widely accepted that improvement in prediction accuracy might translate into future significant savings [24]. It can be observed that vast majority of the studies above have used ANN, SVM and LR as their base classifiers. Furthermore, bagging and boosting were the common method used to create classifier ensembles. The ensemble classifiers have been covered in the related literature, but the focus were typically on homogeneous ensembles, while the heterogeneous ensembles are rarely considered. Out of ten studies mentioned seven focused on building homogenous classifier ensembles. Four focused on heterogeneous classifiers, and one study focused on both. For this reason, we purpose a model which combines both homogenous and heterogeneous ensembles using several ensemble methods and combining strategies to exploit the most from the views of different classifiers about the data. Basically, there can be a right model for the right purpose, and any credit scoring model when it is developed, must take in consideration the problem and how to fit it based the data availability, computational capacity and the model should be easy and understandable on how it works, the model should be adaptable with all conditions and through time and it should also extends to be profitable and detect problems [2]. 2) Artificial Neural Networks (ANN) ANN is machine learning technique that mimics the human brain function. It is made up of interconnected neurons that process information concurrently, learning and adapting from historical data [42]. ANN has been comprehensively used in developing credit scoring models as an alternative to the traditional statistical approaches such as LR. As shown in Figure 1 ANN model consists of three layers: input, hidden and output layers [43], [10]. The structure of the ANN model for credit scoring problem starts by passing the attributes of each applicant to input layer to process them, then they are transferred to the hidden layer for further processing, then the values are sent to the output layer which gives the final answer for problem either to give or not to give a loan. The output is calculated by using weights. Weights are assigned to each attribute expressing its relative importance, then all weighted attributes are summed together and fed to a transfer function (i.e. sigmoid, tansig) to create output. Output gives a final result based on adjusting weights iteratively by minimizing the error between the predicted output and the actual targets [15]. III. RESEARCH METHODOLOGY A. Classification techniques 1) Logistic Regression (LR) LR is an extensively used statistical technique which is popular for solving classification and regression problems [16]. LR is used to model a binary outcome variable usually represented by 0 or 1. Thomas Stated that the outcome of the scoring model have to be binary (accept/ good loan, 0 or reject/ bad loan, 1) and this depends on group of independent variables. The LR formula is expressed as [8]: log[p(1-p)]= β₀ +β1X 1 + β2X2 + …. + βnXn. (1) Where p is the probability of outcome of interest (0/1), β₀ is the intercept term, and βi represent the coefficient related to the independent variables Xi (i=1… n), and log [p (1-p)] is the dependent variable which is the logarithm of ratio of two probabilities outcome of interest. The objective of LR in credit scoring is to determine the conditional probability of a specific input (the characteristics of the customer or features) belonging to a certain class [40]. Despite that the widely application of LR its accuracy decreases when the relationship between variables are nonlinear [16]. However, ANN and SVM are introduced to deal with this issue [41], [21]. Figure 1. ANN topology [28] The ANN model tend to find the relationship between an applicant’s probability of default and its given attribute, filter them and choose among the most important attributes. The core advantages of ANN models is that it can handle incomplete, missing and noisy data, no previous assumptions required about the distribution of the inputs; it can recognize complex patterns between variables [10]. In contrast ANN has some disadvantages such as the long training process, lack of theory [12] [10]. Conversely, ANN has been applied in many application domains such as medicine, engineering and education [44]. In this study a feed-forward backpropagation neural network will be employed. 3) Support Vector Machines (SVM) SVM are another machine learning technique used in classification and credit scoring problems. It is been widely used in the area of credit scoring and other fields for its superior results [17]. SVM was first proposed by Vapnik [45]. SVM finds an optimal hyperplane that categorize the training input data in two classes (good and bad loans) figure 5 show the basis of support vector machines [17]. implement it and it gives good performance results. The diversification of data to be trained is achieved by creating different number of bags (n-bags) each bag is filled by data randomly generated from training dataset, with replication. For instance each bag could have data selected more than once or not selected at all, and each bag is trained by only one classifier. Figure 2. The basis of support vector machines [17] As illustrated in Figure 2, the dashed lines parallel to the optimal hyperplane measures the distance to the solid line. The dashed lines are called margin, and the training data that lies on the margins are called support vectors, however, the SVM finds the tries to find the best optimal hyperplane that separate the data correctly, so the margin width between the optimal hyperplane and the support vectors are maximized to fit the data. Based on the features of the support vectors, it can be predictable which applicant belongs to good or bad credit. The strength of the SVM lies in its ability to model non-linear data, which provides high prediction accuracy compared with other techniques. In case if the training data are not linear the SVM use different types of kernels which transform the data in higher dimensional space. Linear, RBF and Polynomial are types of kernels SVM use [32]. Unlike NN, SVM is sensitive towards outliers and noisy data [46]. B. Classifiers ensembles Ensemble classifiers is are a set multiple classifiers are trained to solve the same problem, it comprises a set of individually or single trained classifiers also called the base classifiers whose predictions are combined in some manner, usually by voting or weighted voting, when classifying new instance [47]. in comparison to the individual classifiers, they only learn and trained on one set of data, but ensemble classifiers learn and train on diversified sets of data generated from the original dataset, which in a result will built a set of hypothesis from the data trained, which will lead to a better accuracy . Several strategies have been developed to impose diversity on the classifiers that make up the ensemble [47] Outlined four basic ways to achieve diversity by using Different combination patterns. Different or diversified classification models Different features subsets. Different training sets. In the following sub-section these popular approaches will be briefly described. To ensure diversity of data training amongst classifiers, several ensemble strategies can be used such as crossvalidation, bagging, boosting, random forest and rotation forest. Due to space restriction, only the bagging method will be explained. 1) Bagging It is a shortcut for bootstrap aggregation [38], it is the one of the first ensemble learning algorithms. It is easy to 2) Combining ensembles outputs After training classifiers on diverse data, each one will produce a prediction; these predictions can be combined through several ways: Simple averaging: where the average of all classifiers predictions for each sample are calculated, to give a final prediction. Majority voting: where all classifiers predictions are pooled together and the class with the highest vote for each sample will be chosen as a final output [38]. Bagging usually uses this method IV. EXPERIMENTAL SETUP A. The datasets Two real –world credit datasets were used to evaluate the predictive accuracy of the proposed model and compare it with single classifies, homogenous and heterogeneous ensembles. German, Australian datasets are the most widely used datasets by researchers in the credit scoring literature. The datasets can be are available publicly online from UCI machine learning depository [49]. Summary of the datasets are illustrated in Table I. TABLE I. DESCRIPTION OF THE DATASETS USED IN THE STUDY Data set German Australian # Loans # Good Loans 1000 690 700 307 # Bad Loans 300 383 # Attributes 20 14 B. Data Normalization Each attribute in the dataset contains values that differ in range. In order to avoid any bias and build up the model with data within the same interval, data should be transformed from different scale values to common scale ones. To achieve this, dataset attributes are normalized to values in the range between 0 and 1. These transformations are done by taking the maximum value within each attribute and divide all the values in the attributes with its maximum value. C. Performance measurements The evaluation measurements of this study were embraced from the traditional performance measures in the area of credit scoring. These measurements are average accuracy, Type I and Type II errors [28]. A combination of these measurements is used, rather than a single measure to measure the predictive performance of the proposed credit scoring model. TABLE II. CONFUSION MATRIX FOR CREDIT SCORING [28] Predicted class (%) Actual class (%) Good loans Bad loans Good loans TP FN (Type II error) Bad loans FP (Type I error) TN From the confusion matrix table the following calculations are defined: Average Accuracy (ACC) = TP+TN/TP+FN+TN+FP (2) Type I error = FP/ TN+FP (3) Type II error = FN/ TP+FN (4) A TP stand for good applicant correctly classified as good, TN stands for bad applicant correctly classified as bad, FN (Type II) stands for good applicant incorrectly classified as bad customer and FP (Type I) stands for Bad customer incorrectly classified as Good customer (high risk). V. EXPERIMENTAL WORK The full experiments in this study were executed using Matlab 2013b version, on a PC with 3.4 GHz, Intel CORE i7 and 8 GB RAM, using Microsoft Windows 7 operating system. The dataset were divided in two sets: Training set to train the developed model, and Testing set (unseen data) to validate the developed model. The distribution of the data was 80% and 20% for training, testing sets respectively. To minimize the effect of the variability of the training set the experiments are repeated 100 times based on different training and testing sets at each run, but all the models are trained and tested on the same partition of the dataset. In order to investigate whether the proposed model is efficient in terms of its predictive performance, different tests will be conducted independently as follows: Testing each single classifier. Testing homogenous ensemble for each classifier using bagging. Testing the proposed heterogeneous ensemble model. Finally, compare the results achieved based on average accuracy, Type I and Type II error and draw out conclusions. VI. RESULTS AND DISCUSSION In this section all the results reported were based on the testing set (unseen data), German dataset includes 140 good loans and 60 bad loans, while Australian dataset include 61 good loans and 77 bad loans. To define the best number of neurons in the hidden layer for the ANN model, we determine it by trial and error test in order to choose the best number of neurons in the hidden layer; we tried a variety of neurons ranged from 5 to 30 neurons. The ANN stopped training after reaching the predetermined number of epochs. The ultimate topology was selected based on the lowest Mean Square Error, and the highest predictive accuracy, essentially for bad loans. The activation function used was ‘tan-sigmoid’. However, the ANN topology was of 20-20-1 for German dataset and 14-281 for Australian dataset. However for the SVM models, kernel function selection plays a significant role in the building of SVM [32]. However, the SVM models was developed using linear kernel function. Tables III and IV show the results of the individual classifiers for both German and Australian dataset. The best single model for German and Australian datasets was LR and ANN respectively. LR scored 75.50% and ANN scored 86.2%. ANN achieved the lowest standard deviation of 2.46% for German dataset which reflect the model stability over several model runs. LR this time got the lowest standard deviation for Australian dataset by 2.8% showing it is more stable model despite ANN accuracy. SVM achieved the lowest accuracy on both datasets, and highest standard deviations. In terms of Type I and II error in German dataset, SVM achieved the lowest Type I error by 28.3% and for the type II error LR and ANN scored the lowest by 11.4%. The surprising thing in the single models is that SVM was perfect in classifying bad loans correctly unlike LR and ANN which were imperfect. But in classifying good loans it was inferior to LR and ANN. For the Australian dataset, ANN showed good performance in classifying bad loan. For the type II error all the classifiers achieved the same error rate. This indicates their good ability in classify good loans correctly in the Australian set. Tables V and VI show the results of the homogenous classifier ensembles independently using bagging method with majority voting combination technique. Again LR ensembles achieved the highest accuracy in the German dataset by 76% also it got the highest for the Australian dataset by 86.59%. This indicates how superior the LR in classifying binary class problems, unlike what is said about it in the literature. The superiority of LR ensembles can be seen also in classifying bad loan at it achieved the lowest type I error for German dataset, followed by ANN and SVM. Unlike its good ability in classifying bad loans as single classifier, SVM achieved the lowest type II error. In Australian dataset SVM achieved the lowest type I and, LR and ANN achieved lowest type II error. In general, it can be concluded that using classifier ensembles can enhance the predictive accuracy of the model; consequently it is superior to single classifiers in most of the cases. Finally, Table VII and VIII shows the result of the proposed model which is based on combining all the prediction that is out of the homogenous classifiers in order to exploit the most from the opinions of the different classifiers about the data classes. Final results are combined using majority voting. The heterogeneous ensembles achieved the highest accuracies in both German and Australian datasets by 80% and 89.13% respectively. Moreover, it can be concluded that heterogeneous classifier ensembles is superior in predicting good loans correctly in both datasets as it achieved the lowest type II error among all the models. Regarding type I error in German dataset, the model came after the single SVM in predicting bad loans correctly. For Australian dataset the model showed good ability to classify bad loans perfectly as SVM bagging. However, this gives an insight that combining different classifiers from different families can enhance the predictive performance in general. TABLE III. RESULTS OF INDIVIDUAL CLASSIFIERS (GERMAN) German dataset LR was achieved the best accuracy of 75.5%. SVM got the lowest type I error. TABLE VI. RESULTS OF HOMOGENOUS CLASSIFIER ENSEMBLES (AUSTRALIAN) Predicted Class Classifier TP FN (Type TN FP ACC % (SD) (Type I) II) Predicted Class Classifier TP FN (Type TN FP (Type I) II) LR_ENS 56 5 64 13 86.59 (2.84) ANN_ENS 56 5 63 14 86.2 (2.58) SVM_ENS 52 9 65 12 84.74 (2.76) ACC % (SD) LR 124 16 27 33 75.5 (2.5) ANN 124 16 26 34 75.0 (2.46) SVM 98 42 43 17 70.5 (2.89) TABLE IV. RESULTS OF INDIVIDUAL CLASSIFIERS (AUSTRALIAN) TABLE VII. RESULTS OF HETEROGONOUS CLASSIFIER ENSEMBLES (GERMAN) Predicted Class Classifier TP TP FN (Type TN FP ACC % (SD) HETRO_ENS (Type I) II) TN LR 56 5 62 15 85.5 (2.8) ANN 56 5 63 14 86.2 (3.25) SVM 56 5 60 17 84.05 (2.87) 130 10 FP ACC % (SD) (Type I) II) Predicted Class Classifier FN (Type 30 30 80.0 (2.3) TABLE VIII. RESULTS OF HETEROGENEOUS CLASSIFIER ENSEMBLES (AUSTRALIAN) Predicted Class TABLE V. RESULTS OF HOMOGENOUS CLASSIFIER ENSEMBLES (GERMAN) Predicted Class Classifier TP FN (Type TN FP ACC % (SD) (Type I) II) LR_ENS 124 16 28 32 76.0 (2.23) ANN_ENS 124 16 26 34 75.0 (2.58) SVM_ENS 125 15 23 37 74.0 (2.25) I. CONCLUSIONS AND FUTURE DIRECTIONS The aim of this study is to build a model based on building classifiers of heterogeneous ensembles in order to achieve higher predictive performance. The proposed model was developed systematically by following the main phases of building a credit scoring model. It started with collecting datasets (German and Australian), after collecting the data we applied splitting method by dividing the data in to training (to learn the model) and testing (to validate the model) sets, then classifiers were generated to train the model. Classifiers were trained individually, and then trained independently using homogenous ensembles.Finally the outcomes of the each homogenous classifier were combined in heterogeneous ensemble to achieve final results. The models performances were evaluated via average accuracy, type I and type II error. For the single models, in Classifier HETRO_ENS TP 58 FN (Type II) TN 3 65 FP ACC % (SD) (Type I) 12 89.13 (2.38) In the Australian dataset ANN achieved the highest accuracy by 86.2% and scored the lowest type I error. Regarding the homogenous ensembles it is clear that in most of the cases it is better than single models. LR ensembles were the superior by achieving the highest accuracy (76%, 86.59%), lowest type I (53.3%) and SVM (15.5%) in German and Australian dataset respectively, which is more costly to banks. Heterogeneous ensembles achieved the highest accuracies and lowest type I among all the testes models. This indicates that combining different families of classifiers together into one single model exploits all the opinions generated by the classifiers to get better and enhanced results. In German dataset the model achieved 80% and in Australian dataset 89.13 %. It is widely accepted that improvements in prediction accuracy will translate into significant future savings [24]. Future research directions could be extended by: (1) using more datasets with different sizes and attributes for extra validation of the model (2) using different ensemble methods such as boosting and random forest to learn more on diversified data subsets (3) using different algorithms methods such as decision tress and Bayesian classifiers. REFERENCES [1] A. Estrella, "Credit ratings and complementary sources of credit quality information," 2000. [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] M. Alaraj, M. Abbod and Z. Hunaiti, "Evaluating Consumer Loans Using Neural Networks Ensembles," International Conference on Machine Learning, Electrical and Mechanical Engineering, 2014. S. Lessmanna, H. Seowb, B. Baesenscd and L. C. Thomasd, "Benchmarking state-of-the-art classification algorithms for credit scoring: A ten-year update," in Credit Research Centre, Conference Archive, 2013, Y. S. Kim and S. Y. Sohn, "Managing loan customers using misclassification patterns of credit scoring model," Expert Syst. Appl., vol. 26, pp. 567-573, 2004. M. Chen and S. Huang, "Credit scoring and rejected instances reassigning through evolutionary computation techniques," Expert Syst. Appl., vol. 24, pp. 433-441, 2003.. T. H. T. Dinh and S. Kleimeier, "A credit scoring model for Vietnam's retail banking market," International Review of Financial Analysis, vol. 16, pp. 471-495, 2007. A. Lahsasna, R. N. Ainon and Y. W. Teh, "Credit Scoring Models Using Soft Computing Methods: A Survey." Int.Arab J.Inf.Technol., vol. 7, pp. 115-123, 2010. L. C. Thomas, "A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers," Int. J. Forecast., vol. 16, pp. 149-172, 2000. V. S. Desai, J. N. Crook and G. A. Overstreet, "A comparison of neural networks and linear scoring models in the credit union environment," Eur. J. Oper. Res., vol. 95, pp. 24-37, 1996. T. Lee and I. Chen, "A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines," Expert Syst. Appl., vol. 28, pp. 743-752, 2005. B. Baesens, T. Van Gestel, S. Viaene, M. Stepanova, J. Suykens and J. Vanthienen, "Benchmarking state-of-the-art classification algorithms for credit scoring," J. Oper. Res. Soc., vol. 54, pp. 627-635, 2003. D. West, "Neural network credit scoring models," Comput. Oper. Res., vol. 27, pp. 1131-1152, 2000. A. Khashman, "Neural networks for credit risk evaluation: Investigation of different neural models and learning schemes," Expert Syst. Appl., vol. 37, pp. 6233-6239, 2010. A. F. Atiya, "Bankruptcy prediction for credit risk using neural networks: A survey and new results," Neural Networks, IEEE Transactions on, vol. 12, pp. 929-935, 2001. R. Malhotra and D. Malhotra, "Evaluating consumer loans using neural networks," Omega, vol. 31, pp. 83-96, 2003. S. Akkoç, "An empirical comparison of conventional techniques, neural networks and the three stage hybrid Adaptive Neuro Fuzzy Inference System (ANFIS) model for credit scoring analysis: The case of Turkish credit card data," Eur. J. Oper. Res., vol. 222, pp. 168-178, 2012. C. Huang, M. Chen and C. Wang, "Credit scoring with a data mining approach based on support vector machines," Expert Syst. Appl., vol. 33, pp. 847-856, 2007. T. Bellotti and J. Crook, "Support vector machines for credit scoring and discovery of significant features," Expert Syst. Appl., vol. 36, pp. 33023308, 2009. Y. Wang, S. Wang and K. K. Lai, "A new fuzzy support vector machine to evaluate credit risk," Fuzzy Systems, IEEE Transactions on, vol. 13, pp. 820-831, 2005. I. T. Van Gestel, B. Baesens, I. J. Garcia and P. Van Dijcke, "A support vector machine approach to credit scoring," in Forum Financier-Revue Bancaire Et Financiaire Bank En Financiewezen-, 2003, pp. 73-82. Z. Huang, H. Chen, C. Hsu, W. Chen and S. Wu, "Credit rating analysis with support vector machines and neural networks: a market comparative study," Decis. Support Syst., vol. 37, pp. 543-558, 2004. C. Tsai and M. Chen, "Credit rating by hybrid machine learning techniques," Applied Soft Computing, vol. 10, pp. 374-380, 2010. C. Tsai, "Combining cluster analysis with classifier ensembles to predict financial distress," Information Fusion, vol. 16, pp. 46-58, 2014. D. J. Hand and W. E. Henley, "Statistical classification methods in consumer credit scoring: a review," Journal of the Royal Statistical Society: Series A (Statistics in Society), vol. 160, pp. 523-541, 1997. [25] H. A. Abdou and J. Pointon, "Credit scoring and decision making in Egyptian public sector banks," International Journal of Managerial Finance, vol. 5, pp. 391-406, 2009. [26] G. Wang, J. Hao, J. Ma and H. Jiang, "A comparative assessment of ensemble learning for credit scoring," Expert Syst. Appl., vol. 38, pp. 223-230, 2011. [27] D. West, S. Dellana and J. Qian, "Neural network ensemble strategies for financial decision applications," Comput. Oper. Res., vol. 32, pp. 2543-2559, 2005. [28] C. Tsai and J. Wu, "Using neural network ensembles for bankruptcy prediction and credit scoring," Expert Syst. Appl., vol. 34, pp. 26392649, 2008. [29] L. Nanni and A. Lumini, "An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring," Expert Syst. Appl., vol. 36, pp. 3028-3033, 2009. [30] L. Yu, S. Wang and K. K. Lai, "Credit risk assessment with a multistage neural network ensemble learning approach," Expert Syst. Appl., vol. 34, pp. 1434-1444, 2, 2008. [31] D. Zhang, X. Zhou, S. C. Leung and J. Zheng, "Vertical bagging decision trees model for credit scoring," Expert Syst. Appl., vol. 37, pp. 7838-7843, 2010. [32] L. Zhou, K. K. Lai and L. Yu, "Least squares support vector machines ensemble models for credit scoring," Expert Syst. Appl., vol. 37, pp. 127-133, 2010. [33] N. Hsieh and L. Hung, "A data driven ensemble classifier for credit scoring analysis," Expert Syst. Appl., vol. 37, pp. 534-545, 2010. [34] B. Twala, "Multiple classifier application to credit risk assessment," Expert Syst. Appl., vol. 37, pp. 3326-3336, 2010. [35] L. Yu, W. Yue, S. Wang and K. K. Lai, "Support vector machine based multiagent ensemble learning for credit risk evaluation," Expert Syst. Appl., vol. 37, pp. 1351-1360, 2010. [36] S. Finlay, "Multiple classifier architectures and their application to credit risk assessment," Eur. J. Oper. Res., vol. 210, pp. 368-378, 2011. [37] G. Wang, J. Ma, L. Huang and K. Xu, "Two credit scoring models based on dual strategy ensemble trees," Knowledge-Based Syst., vol. 26, pp. 61-68, 2012. [38] A. Marqués, V. García and J. S. Sánchez, "Exploring the behaviour of base classifiers in credit scoring ensembles," Expert Syst. Appl., vol. 39, pp. 10244-10250, 2012. [39] A. I. Marqués, V. García and J. S. Sánchez, "Two-level classifier ensembles for credit risk assessment," Expert Syst. Appl., vol. 39, pp. 10916-10922, 9/15, 2012. [40] T. Lee and I. Chen, "A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines," Expert Syst. Appl., vol. 28, pp. 743-752, 2005. [41] C. Ong, J. Huang and G. Tzeng, "Building credit scoring models using genetic programming," Expert Syst. Appl., vol. 29, pp. 41-47, 2005. [42] H. L. Jensen, "Using neural networks for credit scoring," Managerial Finance, vol. 18, pp. 15-26, 1992. [43] S. Haykin, "Adaptive filters," Signal Processing Magazine, vol. 6, 1999. [44] A. Vellido, P. J. Lisboa and J. Vaughan, "Neural networks in business: a survey of applications (1992–1998)," Expert Syst. Appl., vol. 17, pp. 5170, 1999. [45] C. Cortes and V. Vapnik, "Support-vector networks," Mach. Learning, vol. 20, pp. 273-297, 1995. [46] S. Teng, H. Du, N. Wu, W. Zhang and J. Su, "A cooperative network intrusion detection based on fuzzy SVMs," Journal of Networks, vol. 5, pp. 475-483, 2010. [47] L. I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons, 2004. [48] Y. Freund, R. Schapire and N. Abe, "A short introduction to boosting," Journal-Japanese Society for Artificial Intelligence, vol. 14, pp. 1612, 1999. [49] A. Asuncion and D. Newman, UCI Machine Learning Repository, 2007.