A New Method for Early Warning System of Commercial Banks: Hybrid of Trait Recognition and SVM Yinhua Li School of Management, Graduate University of Chinese Academy of Sciences Beijing 100190, China liyinhua07@mails.gucas.ac.cn, (O) +86-10-8268-0697, (F) +86-10-8268-0698 Jianping Li Institute of Policy & Management, Chinese Academy of Sciences Beijing 100080, China ljp@casipm.ac.cn, (cel) +86-1366-138-4584 Yong Shi Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences Beijing 100190, China and College of Information Science and Technology, University of Nebraska at Omaha Omaha, NE 68118, USA yshi@gucas.ac.cn, (O) +86-10-8268-0697, (F) +86-10-8268-0698 1 A New Method for Early Warning System of Commercial Banks: Hybrid of Trait Recognition and SVM Yinhua Li1,2, Jianping Li3 and Yong Shi1,4* 1 Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences Beijing 100190, China 2 School of Management, Graduate University of Chinese Academy of Sciences Beijing 100190, China 3 Institute of Policy & Management, Chinese Academy of Sciences Beijing 100080, China 4 College of Information Science and Technology, University of Nebraska at Omaha Omaha, NE 68118, USA Abstract Commercial bank system is a core sector, which is crucial for the stability of financial system. As a phenomenon, bank failure occurred frequently in most countries in the world (e.x. America, Russia and Turkey). In last twenty years, the average number of failed bank almost reaches 175 per year in America. Based on existed models, the paper aims to introduce a new method for prediction of bank instability based on the historical financial data of 316 American commercial banks. The method purposed in this paper is to combine information encoding technique, which is used in trait recognition with SVM model. The conclusion suggested that our method outperformed logistic model and trait recognition model in prediction accuracy in both training sample and testing sample. Key words Early Warning; Commercial Bank; Failure; Hybrid; Trait Recognition; SVM 1. Introduction Generally, it will bring chaos to economy order if commercial bank system collapses. Therefore it becomes vital to forecast bank failure for both scholars and supervisors. To forecast bank failure precisely, it is necessary to estimate early warning system of bank failure. Based on the existing researches (Karels 1987, Kolari 1996), they demonstrate that historical data from accounting sheets have the prediction power for future operating condition of banks. It is far reaching that establishing an effective early warning model for commercial banks. Firstly, it is propitious to allocate resources for central banks. If problem banks can be detected in time, it can help reducing the cost of regulation and supervision. Secondly, it can make full use of historical data. The attributes variables mainly come from the accounting sheets of the banks. Based on the historical data, bank failure can be forecasted. Thirdly, it can provide evaluation standard to supervisory board in order to judge if banks are given the proper rating. As the development of computer technology, intelligent model are widely applied in the area of bank failure prediction. The key problem will be dealt with in this paper are examining the * Corresponding author 2 effect of the trait recognition method in different data sets. According to the test result, a new method is proposed to predict the commercial bank failure. The reminder of the paper is organized as follows. The next section reviews the literatures on how to establish the early warning system for bank failure prediction. The third section, we analysis and summarize the deficiencies of the trait recognition method through describing and testing the general steps of the traditional method. The forth section, based on the third section, we proposed the new method hybrid of trait recognition and support vector machine method. And the robustness of our method will be tested in different data sample. The last section, we get the conclusions. 2. Literatures review on early warning system EWS in bank failure prediction During the last twenty years, more than 10 countries have experienced banking and currency crisis , with contractions in GDP of between 5% and 12%(Gutierrez et al ,2010). Intelligent models are broadly used for prediction of bank failure. A knowledge tree is given to describe main methods (Fig 1). The most commonly methods include artificial neural network (Gelick and Karatepe,2007), decision tree (Marais,1984), support vector machine (Vapnik,1995), trait recognition (Kolarl et al.,1996; Lanine and Vander,2005), rough set technique (Pawlak,1982) and K-nearest neighbor. The trait recognition technique is a non-parametric method. Kolarl et al(1996) first introduced this technique into the research of bank failure early warning. He chose 1950 American commercial bank as sample during 1984 and 1985. The ration between failed and nonfailed bank is one to six. The main conclusions of Kolarl et al(1996) are summarized in table 1. The curacy in both original split sample and holdout split sample is more than 95 percent. The relative work (Gleb and Vennet 2006) investigate the 582 failed bank date from 1988 to 2004 in Russia. The accuracy of their modified trait recognition model is 85.1 percent in holdout split sample. The back propagation neural network (BPNN) Artificial neural network (ANN) Gelik and Karatepe (2007) MLP Intelligence modeling techniques CL SOM Support vector machine (SVM) Vapnik (1995) The principle component neural network (PCNN) LVQ Rough set technique (RST) Pawlak (1982) Rate of nonperforming loans relative to total loan Capital relative to assets Profit relative to assets Equity relative to assets Idea: maximize the margin between the two data sets Trait recognition (TR) Decision tree (DT) Marais (1984) Kolari et al.(1996) Logit and TR for large US bank Lanine and Vander Vennet (2005) logit and TR for Russia A nonparametric approach Do not impose any distribution assumption Information about interrelation of variables Classification Prediction K-nearest neighbor (K-NN) Fig1 Knowledge tree of main methods for bank failure prediction 3 Table1 The main conclusions of existing work of trait recognition Model Sample James, Caputo and Wagner, 1996 1984 1985 Failed bank:126 Failed bank:123 Nonfailed bank: 878 Nonfailed bank:862 Gleb and Vennet, 2006 Failed bank:582 from 1988-2004 Nonfailed bank:2910 Data source accuracy FDIC(federal deposit insurance of corporation) 1984 1985 Original split sample:96.6% Original split sample:98.6% Holdout split sample:96% Holdout split sample:95.6% CBR(central bank of Russia) Original sample:91.6% Hold out sample:85.1% 3. Background of Trait Recognition (TR) The trait recognition technique is a nonparametric classification model to distinguish the failed bank from nonfailed bank. The advantages of the TR technique including that it impose no priori assumptions on the variables. TR emphasizes the interactions among the independent variables. In TR procedure the following steps to system design are common (James Kolari, Mighele Caputo and Drew Wagner, 1996): Step1. Quantifiably measure the character or traits of the bank and encoding the information. The first step of TR procedure is to construct a binary string according to financial ratios of each bank. For example, consider financial ratio A and B reflected of bank financial condition. Based on sample data, the distribution of these ratios can be plotted. Then for each ratio, two breakpoints are established to segment the distribution of observation into three segments: 1.High only non-failed banks (H, coded “00”); 2.Mixture of failed and non-failed banks (M, coded “01”); 3. Only failed banks (L, coded “11”). Generally, the observation M is described by the binary code M1M 2 ...M l , l is the length of the string. However, for some financial ratios, e.g. ROE, either higher or lower value of ROE represents the failed banks. Therefore, the distribution of the banks can’t be segmented into three parts by two breakpoints. So we adapt a method to solve the problem (Gleb and Vennet 2006). When the cutoff points are chosen, we allow 10% failed banks existing among the non-failed banks in the high section. In a similar way, 10% non-failed banks are allowed to exist among the failed banks in the low section. Step2. Establish the matrix of traits To extract the distinctive feature that represent the common pattern of different groups of observation, the string of binary code should be transferred into a matrix of trait. Each trait is comprised of an array with six integers: T=p, q, r, P, Q, R. The letter p, q, r represents the positions in the binary string from left to right. P, Q, R give the values of the binary code at the position p, q and r. Take the binary string 0001 , for example, the matrix of traits for this bank is as follow: 4 Table 2 the matrix of traits for a bank p q r PQR p q r PQR 1 2 3 4 1 1 1 1 2 3 4 2 3 4 1 2 3 4 2 3 4 000 000 000 111 000 000 011 2 2 3 1 1 1 2 3 4 4 2 2 3 3 3 4 4 3 4 4 4 000 011 001 000 001 001 001 Step3. Feature selection The traits matrix will become huge when the number of the financial variables increases. And no doubt many traits are not able to be used as the features that distinguish between the failed bank and non-failed bank. So it is necessary to select the feature from the large number of traits. A feature is the trait present relatively frequently in the non-failed (failed) banks but relatively infrequently in the failed (non-failed) banks. For example, the traits can be treated as a good feature found in 80 percent of the non-failed banks and only 10 percent of the failed bank, and vice versa in “bad” feature. Table 3 The judgment standard of feature Good feature Bad feature Failed Non-failed 10% 80% 80% 10% Step4. Voting All sample banks are placed in a voting matrix with axes representing the number of good and bad votes. A cutoff line then is chosen by visually inspecting the distribution of failed and non-failed banks in the cells of the voting matrix. If a bank has more good features than bad features, it will be classified as non-failed bank, rather than vise versa. If the number between the good and bad features is not balance, the total times of features can be replaced by average appearance times. Although the trait recognition technique used for predicting bank failure has been improved by Lanine (2006), it also exists some deficiencies. E.x., the robustness of the trait recognition model need further testing in different period data sample. Besides, the process of selecting good or bad feature is arbitrary. It will lead to missing the important information during the selection process. 4. Hybrid of trait recognition and SVM To overcome the deficiencies in existing trait recognition model, we combine information encoding technique, which used in trait recognition with SVM model reasonably. The specific process is described as followed: Select proper training data sample. Then the step1 and step2 of trait recognition are continued to be used. After encoding as traits, substitute them into liner 5 SVM model, see equation (1) . min l 1 l l yi y j ( xi x j ) i j j 2 i 1 j 1 j l s.t. y i 1 i i 0 0 i C , i 1,..., l (1) According to the result of ten-fold cross validation, a high accuracy model is selected. In order to demonstrate the robustness of the model, we substitute another test data set into the model. The new proposed method has three advantages. Firstly, it will not miss any useful information. Secondly, it will give a clear result whether a bank belongs to failed group or non-failed. And thirdly, it will consider correlation among two or three attributes through encoding process. 5. Data description The data is collected from Federal Deposit Insurance Corporation (FDIC), including both failed and non-failed commercial bank. The number of the failed bank is 158. And we select the same number of non-failed bank corresponding to the failed bank. Meanwhile, we collect another 98 banks are used as test sample. Half of the test samples are failed after June 2010. And for each bank, eight financial ratio are chosen including Profit margin, asset, Capital growth, Loan funding, Provision rate, Short term gap, Capital ratio. These financial ratios cover six aspects of financial condition, including profit, size, growth, loan risk, interest rate risk and capital. Table 4 financial ratio for each bank Category Financial ratio Remarks Profit Size Growth Profit margin asset Capital growth Loan risk Loan exposure Loan funding Provision rate Interest rate risk Short-term gap Capital Capital ratio Net interest income/total assets Total assets (Total equity(t)-Total equity(t-1)/ Total equity net loans and leases/Total assets Net loans and leases/Total deposits Provision for possible loan losses/Total assets (Assets adjustable 1 day to 1 year liabilities adjustable1 day to 1 year)/ Total assets Total equity/Total assets For each bank in training sample, there are 816 traits reflecting the information among any two or three financial ratios. Follow the principle of feature selection, there 281 features, in which there are 156 bad features and 125 good features among all the 113424 traits. And the preliminary result is summarized in table 5. 6 Table 5 Preliminary result Method Accuracy rate (percent) Logistic Modified TR Hybrid of TR and SVM Training sample 87.34 90 95.8 Testing sample 85.8 71.4 88.2 6. Conclusions Trait recognition is an important method to predict the commercial bank failure. However, the robustness of the method performs not very well comparing to another two methods. The accuracy of the hybrid of TR and SVM outperform the TR neither in training sample and testing sample. It can make full use of information represented by financial data. The new method proposed in the paper still has some deficiencies. The result of the method depends on the training sample heavily. The selection of cutoff point is arbitrary. Acknowledgement This research has been partially supported by a grant from National Natural Science Foundation of China (#70621001, #70531040, #70501030, #10601064, #70472074),National Natural Science Foundation of Beijing #9073020, 973 Project #2004CB720103, Ministry of Science and Technology, China and BHP Billiton Co., Australia. Reference Boyacioglu, M.A., Yakup K. and Omer K.B. Predicting bank financial failures using neural networks, support vector machines and multivariate statistical methods: A comparative analysis in the sample of savings deposit insurance fund (SDIF) transferred banks in Turkey. Expert Systems with Applications, 2009, 36(2), 3355-3366 Celik AE, Kararepe Y. Evaluating and forecasting banking crises through neural network models: an application for Turkish banking sector. Export systems with Applications 2007:33; 809-15 Honohan, Patrick. Banking System Failures in Developing and Transition Countries: Diagnosis and Prediction. BIS WP NO39, 1997 Kaminsky, Graciela and Garmen Reinhart. The Twin Crises: The Cause of Banking and Balance-of-Payments Problems, America Economic Review 89(3), Jun 1999, p. 473-500 Karels GV, Prakash AJ. Multivariate normality and forecasting of business bankruptcy. Journal of Business Finance and Accounting 1987; 14(4) Kolari J., Michele C. and Drew W. Trait recognition: an alternative approach to early warning systems in commercial banking. Journal of Business Finance and Accounting 1996; 23(9) Kumar, P.R. and V.Ravi. Bankruptcy prediction in banks and firms via statistical and intelligent 7 techniques- a review. European Journal of Operational Research, 2007, 180,1-28 Marais ML, Patel J. Wolfson M. The experimental design of classification models: an application of recursive partitioning and bootstrapping to commercial bank loan classification. Journal of accounting research 1984; 22; 87-113 Pawlak Z. Rough sets. International Journal of Computer and Information Science 1982; (11); 341-56 Rumelhart, D.E.,Hinton,D.E.,&Williams,R.J. Learning representation by backpropagating errors. Nature 1986; 323(9) Vapnik VN. The nature of statistical learning theory. Berlin: Springer; 1995 Zhang, G.P. Neural networks in business forecasting. Hershey, London: Idea Group Publishing.2004 8