European Journal of Operational Research 212 (2011) 123–130 Contents lists available at ScienceDirect European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor Stochastics and Statistics Modelling the profitability of credit cards by Markov decision processes Meko M.C. So ⇑, Lyn C. Thomas School of Management, University of Southampton, Highfield, Southampton SO17 1BJ, United Kingdom a r t i c l e i n f o Article history: Received 20 May 2008 Accepted 14 January 2011 Available online 21 January 2011 Keywords: OR in banking Markov decision process Credit card Behavioural score Profitability Probability of default a b s t r a c t This paper derives a Markov decision process model for the profitability of credit cards, which allows lenders to find an optimal dynamic credit limit policy. The states of the system are based on the borrower’s behavioural score and the decisions are what credit limit to give the borrower each period. In determining which Markov chain best describes the borrower’s performance, second order as well as first order Markov chains are considered and estimation procedures developed that deal with the low default levels that may exist in the data. A case study is given in which the optimal credit limit is derived and the results compared with the actual outcomes. Ó 2011 Elsevier B.V. All rights reserved. 1. Introduction Since the advent of credit cards in the 1960s, lenders have used credit scoring, both application and behavioural scoring, to monitor and control default risk. However in the last decade the lenders’ objectives have changed from minimising default rates to maximising profit. Lenders have recognized that operating decisions are crucial in determining how much profit is achieved from a card and this paper focuses on the most important operating policy: The management of the credit limit. Soman and Cheema (2002) conducted a study on the use of credit limit policies in encouraging spending and found that the availability of additional credit does promote card usage in some consumers. Currently lenders set credit limits by subjectively determining an appropriate value for each cell in a risk/return matrix. They decide on a credit limit for each combination of risk band and average balance, where the average balance is considered to be a surrogate for the return to the lender from that customer. This approach is static in that it does no consider how the customer’s default risk and the customer’s profitability will change over time. Nor is there any optimization involved in deciding what credit limits to choose. We propose using a Markov decision processes (MDP) to improve the credit limit decision. A MDP model determines the optimal sequence of credit limit decisions by considering the evolution of a customer’s behaviour over time. It calculates the profitability of a credit card customer under the optimal dynamic credit limit policy. ⇑ Corresponding author. Tel.: +44 0 7796035843. E-mail addresses: M.So@soton.ac.uk (M.M.C. So), L.Thomas@soton.ac.uk (L.C. Thomas). 0377-2217/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.ejor.2011.01.023 Lenders keep a wealth of historical credit card data, including customers’ behavioural scores each month. Behavioural scores are a way of assessing customers’ default risk in the next year. Building the Markov decision process model on behavioural scores has the advantage that most lenders have been keeping this data on customers for a number of years. With the advent of the Basel Accord in 2008, lenders are required to keep such data for five years and are encouraged to keep it through a whole economic cycle. MDPs have been used in a number of different contexts (Heyman and Sobel, 1982; Ross, 1983; White, 1985, 1988, 1993; Kijma, 1997). The first application of MDPs in consumer credit was by Bierman and Hausman (1970) who looked at the repayment of a loan where no further borrowing was allowed. The model assumed the repayment of the customer followed a prior probability distribution. Using a Bayesian approach, the model revises the probability of repayment in the light of the collection history. Modifications of the basic model were made both in the accounting rules (Dirickx and Wakeman, 1976) and in the form of the Markov chain (Frydman et al., 1985). A MDP model to optimise consumer lifetime value was developed by Trench et al. (2003). Here the actions in the MDP model were the consumer’s credit card limit or the interest rate charged. So the actions in that paper are similar to the ones in this paper. However their state space did not involve behavioural scores nor were they concerned with the problems of estimating the transition probabilities if there are low default rates. Instead they used a six dimensional state space each dimension having only two or three categories describing the recency and frequency of purchases and payments. The authors developed mechanisms for reducing the size of the transition matrix through 124 M.M.C. So, L.C. Thomas / European Journal of Operational Research 212 (2011) 123–130 merging states. Ching et al. (2004) used MDPs to manage the customer lifetime value generated from telecommunication customers, but again the state space used in that study involved marketing factors not risk factors and the decision was whether to implement promotions. This paper is the first to use behavioural score bands as the basis for MDP models. The advantage of basing the model on behavioural scores is considerable. Almost all lenders calculate such scores every month for every borrower as a basis for their Basel Accord probability of default calculations and as a way of segmenting the population by risk. When modelling real problems using Markov decision processes, the curse of dimensionality (Puterman, 1994) can mean the state space is very large and that one would need a large amount of data to obtain robust estimates of the transition probabilities. Using behavioural scores helps to overcome this first difficulty because it itself is a ‘‘sufficient statistic’’ of the risk of the account and already contains information from a number of different characteristics. Also by aggregating states one can obtain a simple but meaningful state space. In our case we make part of each state an interval of behavioural scores and similarly combine possible credit limits into bands, which make up the other part of the state. The actions are then which of these credit limit bands should be applied to the borrower in the next month. Acquiring enough data to calculate robust estimators of the transition probabilities is not a problem in the consumer credit context because of the size of the data sets available to lenders. The only problem is that with some portfolios of loans, the number of movements directly into default from some states is so low (quite possibly zero) that the resultant zero transition probability estimate may affect the structure of the Markov chain. This problem of estimating default probabilities in low default portfolios was also highlighted in the Basel Accord. We therefore use an approach suggested in that context (Pluto and Tasche, 2006) and extended by the UK regulators (Benjamin et al., 2006) which ensures the resulting Markov chain model is robust and conservative. The conservativeness is reasonable as one would prefer the model to underestimate rather than over estimate the profitability of a credit card account. The main contribution of this paper is to show how one can use Markov decision process models based on states consisting of behavioural score bands – scores which most lenders calculate on a monthly basis – to determine optimal credit limit policies. Such policies maximise the expected profitability of each borrower and allow for policy rules such as never decreasing the credit limit which lenders may impose. The paper deals with some practical issues of modelling these transitions such as avoiding zero probability transition estimates which could affect the structure of the Markov chain. The approach is applied to a case study of a Hong Kong credit card portfolio of over 1.4 million accounts. The results of the modelling are compared with what actually happened in the portfolio. The rest of the paper is organized as follows: Section 2 describes the MDP model formulation. Section 3 discusses the estimation of the transition probabilities including the probabilities of defaulting immediately. Section 4 presents the practical issues in applying the MDP model to the real credit card data while the results of the case study are described in Section 5. The final section draws some conclusion on the model and the resultant case study. parts-which behavioural score band the borrower is in and what is the borrower’s current credit limit band. The state space thus consists of the current credit limit band represent by L (indexed by l = 0, 1, . . . , L) and the current behavioural score band I (indexed by i = 0, 1, . . . , I). The latter is extended to allow absorbing states corresponding to various types of default or closure of the account. In our model the actions are limited to keeping the credit limit at its current level or raising it to a higher limit band. This policy of not decreasing credit limits is used by many lenders but the methodology we describe will not change if this restriction is dropped. With this limitation the action set is defined as Al = {l0 : l 6 l0 }. Two further elements need to be defined to complete the Markov decision process model. Let p(i0 jl, i) be the probability that if l is the current customer’s credit limit band and the customer is in behavioural score band i, then the next period the customer will be in behavioural score band i0 . Secondly let r(l, i) be the profit obtained in the current period from a customer with credit limit l and in behavioural score band i. The objective is to maximise the discounted profit obtained from the customer over the next t periods where the discount factor k describes the time value of money. This leads to the following optimality equation for Vt(l, i), the maximum expected profit over the next t periods that can be obtained from an account which is currently in behaviour score band i, and with a credit limit of l: ( V t ðl; iÞ ¼ max rðl; iÞ þ 0 l 2Al Consider a discrete state, discrete time discounted Markov decision process with decision epochs T (indexed by t = 1, 2, . . . , T) based on a state space S. Each state in the state space consists of two ) 0 0 0 pði j l; iÞkV t1 ðl ; i Þ : ð1Þ i0 The right-hand-side of (1) corresponds to the profit over the next t periods if we change the credit limit to l0 from l at the end of the current period for an account with behavioural score state i. We assume that a change in the credit limit takes effect in the next time period since such a decision is usually included in the next monthly balance statement sent to the customer. Removing this delay makes no difference to the methodology though of course the optimality equation will be slightly different. The profit to the lender from the credit card during the current period is r(l, i). The p(i0 jl, i) is the probability that the behavioural score changes to band i0 . In that case, the profit on the remaining t 1 period is Vt1(l0 , i0 ). The discount factor k is introduced because the profits in the remaining t 1 periods actually occur one period after those used in calculating Vt1(l0 , i0 ) since that assumes the t 1 periods start now. The optimality principle says that the decision l0 , which maximises the right hand side of (1) is the one to use when there are t more periods to go if one wants to maximise the sum of the future profits, when credit limits can only remain the same or be increased. 3. Estimating the PDs of low default portfolios Maximum likelihood estimators are used to estimate the transition probabilities of a Markov chain. In the Markov chain described in Section 2, let nt(l, i) be the number of accounts in state (l, i) at time t and let njt ðl; iÞ of them move to behavioural score state j at time t + 1. Assuming the Markov chain is stationary means the ~ðj j l; iÞ for the probability p(jjl, i) is maximum likelihood estimate p PT1 j t¼1 nt ðl; iÞ PT1 t¼1 nt ðl; iÞ 2. The MDP model X : In reality, moving directly to the default state is a rare event, particularly for high value (low risk) behavioural scores. There may be no examples in the data of transaction from certain states (l, i) to the default state D. Thus it is possible that p(Djl, i) may be very small or even equal to zero. Putting such estimates into the MDP model M.M.C. So, L.C. Thomas / European Journal of Operational Research 212 (2011) 123–130 leads to apparent ‘‘structural zeros’’ which change the connectedness of the dynamics in the state space. If the probability of defaulting from a given state is zero this can lead to unusual optimal policies because the system wants to move to those apparently ‘‘safe’’ states. For example, suppose there are only three behavioural states: Excellent, Good and Bad where Bad is the default state. If there are two credit limit states, 10,000 and 50,000. (see Fig. 1) Provided the discount value k is close to 1 and t is very large, the optimal policy will have a credit limit of 10,000 in states 1 and 2 even though the profit per period r(l, i) is much higher when the credit limit is 50,000 than when it is l = 10,000 (10 as opposed to 1 say). This is because there is apparently no chance of default if the credit limit is 10,000 while there is a very small chance of default when the credit limit is l = 50,000. This anomaly arises because the maximum likelihood estimates are p(Dj10,000, i) = 0 and p(Dj50,000, i) > 0, even though these estimates are based on only 800 cases when the credit limit is 10,000, but 8000 cases when the credit limit is 50,000. One way to overcome this problem is to take a conservative estimate of the default probabilities, rather than the maximum likelihood estimate. This problem has been extensively discussed in the context of the Basel Accord where again bank regulators and lenders have been considering the robustness of estimates of default probabilities in low default portfolios. We follow the approach introduced by Pluto and Tasche (2006) and extended by Benjamin et al. (2006). Firstly we assume the transitions to default are monotonically decreasing as the behavioural score increases and so if the score bands are labelled with I being the highest quality, then pðDjl; IÞ 6 pðDjl; I 1Þ 6 6 pðDjl; 2Þ 6 pðDjl; 1Þ: So a conservative assumption for state i would be that for some J, J6i 125 The second conservative assumption in this approach is not to use the MLE estimate of the default probability, but rather take the lower confidence limit of the default probability. Ifs there are D(l, i) accounts defaulting in the next period from the N(l, i) accounts under consideration, it is assumed this is given by a Binomial distribution Bin(N(l, i), p). One chooses p to be the highest probability of default, so that the corresponding lower a-confidence limit is exactly D(l, i), i.e. getting D(l, i) or fewer defaults has no more than a 1 a/2 probability of occurring. Since the mean and variance of the Binomial distribution is N(l, i)p and N(l, i)p(1 p), one chooses p to be the value where qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Nðl; iÞp U1 ð1 a=2Þ Nðl; iÞpð1 pÞ ¼ Dðl; iÞ; where U1(1 a/2) is the inverse cumulative standard normal distribution, i.e. the number of standard deviations below the mean in a standard normal distribution so that the chance of getting a lower value is (1 a/2). ^D ðl; iÞ in this way for these states (l, i) One chooses the estimate p PT D where the number of actual defaults t¼1 nt ðl; iÞ is at or below same agreed value – which might be zero. One would like to use MLE to obtain the estimates of the other transition probabilities ^j ðl; iÞ from state (l, i). However this would result in the sum of p the transition probabilities being greater than 1 and so instead one defines the transition probability estimates as aðl; iÞ b p ðj!l; iÞ where ^D ðl; iÞ 1p : bj j–D p ðl; iÞ aðl; iÞ ¼ P P For states (l, i) where the number of defaults Tt¼1 nDt ðl; iÞ exceed the low default bound, MLE is used to estimate all the transition probabilities. pðDjl; iÞ ¼ pðDjl; i 1Þ ¼ ¼ pðDjl; J þ 1Þ ¼ pðDjl; JÞ; 4. Apply MDP model to credit card data where J is the most risky of the states with low number of cases going directly into default. This means that to estimate p(Djl, i), P Pi we take a sample of Nðl; iÞ ¼ T1 t¼1 k¼J nt ðl; kÞ borrower/months of customers who are in state i or worse (down to state J) and PT1 Pi Dðl; iÞ ¼ t¼1 k¼J nDt ðl; kÞ of these default in the next month. 4.1. Sampling and data preparation The MDP model was applied to credit card data from a major Hong Kong bank. The dataset consisted of the credit card histories and characteristics of over 1,400,000 credit accounts for each of Fig. 1. An unexpected optimal policy. 126 M.M.C. So, L.C. Thomas / European Journal of Operational Research 212 (2011) 123–130 60 months. The fields used in this study were account balance, account repayment, credit limit, account written-off record and behavioural score. In each monthly dataset, we extracted a random sample of 50,000 accounts that included new, existing, inactive, default and closed accounts over the sampling period. In the following discussion, these extracted accounts are indexed by time t. We then looked at how these accounts performed in the next month including the change in behavioural score. There were 3,000,000 cases (50,000 60 months) extracted from the dataset. We deleted some of the ambiguous, missing and special accounts that we are not interested in covering in this study, which accounted for less than 0.2%. We eventually used 2,994,602 transitions for the analysis. One possible concern with the extracted sample is whether the existing credit limit policy was affecting the data. However, looking at the sample, only 8% of the 3 million transitions had credit limit adjustment in the previous three months and most of these were to Inactive accounts. Thus we felt the existing policy would not have a major impact on the data. 4.2. Special accounts Each month an account is given a behavioural score or put in a special state, such as closed, inactive, 3 + cycle or defaulted. A closed account is one where the credit card was terminated with zero account balance. A credit card account which has never been activated, was newly opened in the last two months (and so does not have enough data to merit a behavioural score) or has not been used in the last twelve months is called Inactive. A 3 + cycle account is one which the credit card account has arrears of three months or more, but the lender has not yet written off the account. The most important special account states in the data are the defaulted (or written off) accounts. There are four possible reasons to write-off an account, including bankruptcy, charge-off, revoked and 3 + cycle delinquent. Bankruptcy is when a borrower is declared bankrupt by a court; charge-off is when the lender does not believe the debt can now be recovered from the borrower; and revoked is when the credit card is stopped because of illegal behaviour by the borrower, such as being over the credit limit persistently. Lenders pass a written off account to the debt collection department to follow-up, and in time the account may repay all, part or none of the outstanding debt. Even when the account makes full repayment, the duration of the collection process could be of several years. It is important to estimate the average future repayment amount of different default accounts because it changes the values of the profit function in (1). There is little research (Matuszyk et al., 2010) on estimating the loss given default of revolving credit products. So we use a simple approach to compare the debt repayment ratios. Defined tþ24 R Btþ1BB where Bt+1 is the current balance of an account at detþ1 fault in month t. The repayment ratio R thus is defined as the proportion of the debt repaid to the lender after two years. For confidentiality reasons, we could not show the exact repayment ratio, but the results showed one of the forms of default had a high repayment ratio which was significantly different from the others. We call this default account state, Bad2 and group the rest of the three forms of written off together into one default state, called Bad1. So in our cost function the loss generated by accounts in Bad1 is higher than those of Bad2. 4.3. Coarse-classifying Since a behavioural score typically has several hundred values, it is sensible to split the score into a number of bands to reduce the size of the state space. We aim to find suitable splits so that the Markovian assumption holds as nearly as possible. To check whether the Markov chain satisfies this assumption, we are interested in whether for each state the hypothesis that the probability of moving from st = (lt, it) to it+1 is independent of the state at t 1, i.e. st1 = (lt1, it1). Define nt(lt1, it1; lt, it; it+1) to be the number of times that a credit account was in state (lt1, it1) at time t 1 followed by moving to (lt, it) at time t and it+1 at time t + 1. Similarly define nt(lt1, it1 ; lt, it) to be the number of times that a customer’s behavioural score was in state (lt1, it1) at time t 1 and then moves to behaviour score state (lt, it) at time t. We assume the chain is stationary, thus the estimator for p(it+1jlt1, it1, lt, it) is: PT2 ^ðitþ1 jlt1 ; it1 ; lt ; it Þ ¼ p t¼0 nt ðlt1 ; it1 ; lt ; it ; itþ1 Þ : PT2 t¼0 nt ðlt1 ; it1 ; lt ; it Þ The Markovity of the chain corresponds to the hypothesis that pðitþ1 j1; 1; lt ; it Þ ¼ pðitþ1 j2; 1; lt ; it Þ ¼ ¼ pðitþ1 jL; 1; lt ; it Þ ¼ pðitþ1 j1; 2; lt ; it Þ ¼ ¼ pðitþ1 jL; I; lt ; it Þ for lt, it, it+1. To check on the Markovity of state (lt, it), we use the chisquare test (Anderson and Goodman, 1957). Let X v2ðlt ;it Þ ¼ X ðlt1 ;it1 Þ2S itþ1 2I ^ðitþ1 jlt1 ; it1 ; lt ; it Þ p ^ðitþ1 jlt ; it Þ2 n ðlt1 ; it1 ; lt ; it Þ½p ; ^ðitþ1 jlt ; it Þ p ð2Þ where PT1 ^ðitþ1 jlt ; it Þ ¼ p t¼1 nt ðlt ; it ; itþ1 Þ PT1 t¼0 nt ðlt ; it Þ and n ðlt1 ; it1 ; lt ; it Þ ¼ T 1 X nt ðlt1 ; it1 ; lt ; it Þ: t¼1 Anderson and Goodman (1957) showed that (2) has a chi-square distribution with (I 1)(L 1)2 degree of freedom. Our methodology for choosing these splits is similar to that used in coarse classifying characteristics in credit scorecards (Chapter 1, Thomas, 2009); that is splitting a continuous variable into bands, each of which has different bad rates. Here we are splitting the behavioural score into bands, which have different subsequent dynamics (Thomas et al., 2001). A traditional approach is to start with a fine classification i.e. with more bands then one really wants and then check if one can combine adjacent bands. Alternatively, one can use the classification tree approach of finding the best split into two classes and then splitting one of these into two more until it is not worth splitting further. That is the approach we used in the algorithm. We repeated the process in both behavioural score and credit limit to come up with 15 score bands, of which 10 were behavioural score band and 5 were special states; we also ended up with 11 credit limit bands as listed in Table 1. For confidentiality reasons, we do not disclose the precise behavioural score bands. We will use Score1 to Score10 to represent the credit score where Score1 are those with lowest behavioural score and Score10 are those with highest. Similarly, we use Limit1 to Limit10 to represent the credit limits with Limit1 being the lowest credit limit band. Table 1 gives the distribution of the borrowers into the different behavioural score bands and the various credit limit ranges in a typical month. 4.4. Choice of order A MDP is mth-order if the transition probabilities depend on which state the system is currently in and what state it was in 127 M.M.C. So, L.C. Thomas / European Journal of Operational Research 212 (2011) 123–130 Table 1 Distribution of behavioural score and credit limit bands. Index Definition Per. (%) Behavioural score band 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Closed Bad1 Bad2 3 + cycle delinquent Inactive Score1 Score2 Score3 Score4 Score5 Score6 Score7 Score8 Score9 Score10 0.54 0.09 0.05 0.02 6.16 0.11 1.68 11.29 19.30 10.65 5.32 6.07 11.20 10.14 17.39 Credit limit band 0 1 2 3 4 5 6 7 8 9 10 Closed Limit1 Limit2 Limit3 Limit4 Limit5 Limit6 Limit7 Limit8 Limit9 Limit10 – 7.21 10.58 11.67 10.55 9.35 10.46 10.02 10.21 9.36 10.60 for the previous m 1 periods. For a first order Markov chain the transition probability depends only on the current state where for a mth-order Markov chain the transition at time t depends on the states (it, it1, . . . , it+1m) that it occupied for the last m time periods. So, the number of states increases exponentially in m as there are jSjm states in a mth-order MDP. To test whether a chain satisfies the mth-order Markovity assumption, one can again apply the chi square test (Anderson and Goodman, 1957). Test results indicated the Markov chain was not first order. In reality, almost all applications fail to satisfy the first-order Markovity assumption, because with so much data, one can usually improve the fit beyond what are narrow significance limits. What is more important is whether there is a significant improvement in the fit, when one uses second or third order Markov chains. So, we tested whether the process is second-order Markov i.e. we redefined the state so that it carried the history of t 2 and t 1. Although there was an improvement on the chi-square value, the hypothesis that the chain was a second-order MDP was also not justified. Using an even higher order Markov chain increases the size of the state space exponentially and so will affect the robustness of the model. The question is whether the improvement in fit justifies the increase in size of the model and the consequent increase in computational time to solve the problem. In higher order Markov chains, the estimates of the transition probabilities are based on less data in each cell and so are less robust. What is important is that the model is robust over future time periods not just that it is a good fit for past data. For these reasons and because the improvement in fit when going from first to second order was not much greater (in terms of log likelihood) then going from second to third order it was decided to use the simple first order model. 4.5. Transition matrices Tables 2a and 2b show the transition matrices in the dataset. Each entry represents the transition percentage from the state of the account at time t to the state of the account at time t + 1. ‘‘–’’ represents there was no observation in the sampling period; ‘‘0’’ that the transition probability was less than 0.0005. The number of accounts in different states at time t are given in the last column of the table. As the transition matrix is enormous, we only presented those of Limit1 and Limit10 for illustration. Also, we show the results by splitting the states into two groups- ordinary states are those with Score2 or above; Closed, Bad1, Bad2, 3 + cycle, Inactive and Score1 are classified as special states. Table 2a shows the transition of moving to ordinary states. The matrix is dominated by the diagonal entries as one expects, and the volatility of score transitions decreases as the credit limit increase. 88.6% of accounts with Score10 and Limit10, remain in the same state after one month whereas only 75.2% of accounts in Score10 but with Limit1 stay in the Score 10 band. This is repeated in most high behavioural score groups whereas for low score groups, the account movements are closer for different credit limits. Table 2a Transition matrix – moving to behavioural score states. Score2 Score3 Score4 Score5 Score6 Score7 Score8 Score9 Score10 Total Limit 1 3 + cycle Inactive Score1 Score2 Score3 Score4 Score5 Score6 Score7 Score8 Score9 Score10 18.3 0.1 48.4 66.1 7.6 0.3 0.1 0.1 0 0 – 0 15.4 0.5 3.1 25.3 76.8 7.7 2.4 1.8 1.3 0.6 0.4 0.3 – 3.5 – 0.3 13.2 77.1 13.4 7.3 4.7 3.1 3.9 1.6 – 2.4 – – 0.4 8.9 60.2 15.5 8.3 2.9 1.8 0.7 – 1 – – 0.2 1.4 6.5 48.7 6 6.9 1.3 0.9 – 0.6 – – 0 1.5 5.7 6.8 57 9.4 6.2 1.5 – 0.9 – – 0 1.3 5.7 12.7 10 57.3 14.4 9.6 – 0.3 – – 0 0.1 2.3 2.2 6.9 9.3 58 9.9 – 3 – – – 0.1 2.2 3.3 3.3 9 13 75.2 104 44,616 577 8301 27,953 42,024 19,650 9159 11,846 18,163 12,255 21,328 Limit 10 3 + cycle Inactive Score1 Score2 Score3 Score4 Score5 Score6 Score7 Score8 Score9 Score10 13.3 – 24.2 58.8 2.4 0.1 0 0 0 0 0 – 13.3 0.2 3.1 27.7 77.4 8.1 1.5 0.7 0.7 0.5 0.3 0.1 – 2.1 1.2 1.6 18.2 78.5 11.7 4.7 3.4 2.4 1.1 0.9 – 2.5 – 0.2 0.5 8.8 59.5 10.4 7 3.7 1.5 0.3 – 1.1 – – 0.1 1.4 7.7 58.9 6.9 4.4 0.9 0.4 – 0.7 – – 0 0.9 5.6 8.2 55.2 6.3 4.5 0.5 – 1.5 – – 0 1.3 7.4 9.1 10.4 62.2 7.3 4.4 – 0.8 – – 0 0.2 4.4 3.3 10.2 9.7 73 4.7 – 2.2 – – 0 0.1 1.6 3.3 4.9 10.1 10.9 88.6 15 25,885 161 1799 28,656 59,188 26,563 14,879 17,050 32,561 37,615 72,939 128 M.M.C. So, L.C. Thomas / European Journal of Operational Research 212 (2011) 123–130 Table 2b Transition matrix – moving to special states. Closed Bad1 Bad2 3 + cycle Inactive Score1 Total Limit 1 3 + cycle Inactive Score1 Score2 Score3 Score4 Score5 Score6 Score7 Score8 Score9 Score10 17.31 0.79 1.73 1.07 0.97 1.08 1.1 1.19 1.18 0.91 0.72 0.33 33.65 0 22.53 1.07 0.97 1.08 1.1 1.19 1.18 0.91 0.72 0.33 11.54 0 7.11 1.06 0.18 0.04 0.04 0.05 0.03 0.01 0.01 – 2.88 – 5.72 0.48 0.03 – – – – – – – – 86.96 – – 0.12 0.39 0.34 0.45 1.17 0.5 0.23 0.06 0.96 – 11.44 4.36 0.34 – – – – – – – 104 44,616 577 8301 27,953 42,024 19,650 9159 11,846 18,163 1225 21,328 Limit 10 3 + cycle Inactive Score1 Score2 Score3 Score4 Score5 Score6 Score7 Score8 Score9 Score10 46.67 0.89 3.73 1.56 0.42 0.36 0.38 0.58 0.57 0.29 0.3 0.15 20 – 52.8 2.56 0.19 0.03 0 – – – – – 6.67 – 4.35 1.72 0.39 0.11 0.03 0.01 0.01 0.01 – 0 – – 62 1.22 0 – – – – – – – – 88.02 – – 0.1 0.14 0.28 0.74 0.66 0.34 0.22 0.03 – – 9.94 4.72 0.23 0 – – – 0 – – 15 25,885 161 1799 28,656 59,188 26,563 14,879 17,050 32,561 37,615 72,939 Table 3 Transition matrix of the adjusted PDs on LDPs – moving to special states. Closed Bad1 Bad2 3 + cycle Inactive Score1 Total Limit 1 Score4 Score5 Score6 Score7 Score8 Score9 Score10 1.08 1.1 1.19 1.18 0.91 0.72 0.33 0.03 0.02 0.02 0.02 0.02 0.02 0.01 0.06 0.06 0.06 0.05 0.05 0.04 0.04 – – – – – – – 0.39 0.34 0.45 1.17 0.5 0.23 0.06 – – – – – – – 42,024 19,650 9159 11,846 18,163 1225 21,328 Limit 10 Score4 Score5 Score6 Score7 Score8 Score9 Score10 0.36 0.38 0.58 0.57 0.29 0.3 0.15 0.02 0.01 0.01 0.01 0.01 0.01 0.01 0.04 0.03 0.03 0.02 0.02 0.02 0.01 – – – – – – – 0.14 0.28 0.74 0.66 0.34 0.22 0.03 0 – – – 0 – – 59,188 26,563 14,879 17,050 32,561 37,615 72,939 Table 2b shows the transitions of moving to special accounts. For Limit 10, there is no example of an account with Score 6 or above moving to the Bad 1 state. However 52.8% of those in Score1, with credit Limit10, move to Bad1 status the next month, which is a much higher percentage then those with credit limits in the Limit1 band. This may be due to the higher balance that such borrowers are carrying. On the other hand, as Table 2a shows 48.4% of these in Score1 and Limit1 improve their scores to Score2 in the next month. the number of movements to defaults in Score4 or higher score bands is below 20. We thus defined the accounts in Score4 to Score10 as making up the low default portfolio and the corresponding adjusted transition probabilities are presented in Table 3.1 In this case, there are only small changes in the transition probabilities to this adjustment. However, for other data sets, the differences could be more significant. 4.6. Transition of LDP Table 4 shows the profit function used in the MDP model which was estimated by using the monthly profit field. For each state (l, i), the average monthly profit of the borrowers was taken over the cases when the borrower is in behavioural state i with credit limit l being applied. In general the profit is decreasing with behavioural score, because a borrower with high behavioural score is more likely to make full repayments. Thus the lenders can only gain the merchandise fee from them (the fee on each credit card pur- To prevent introducing invalid structural zeros into the transition matrix, we adjust the transition matrix of this low default portfolio by the method presented in Section 3. It is important to determine when such an adjustment needs to be made. Benjamin et al. (2006) tested several approaches to determining when this adjustment is useful. They concluded that there is no ideal method but suggested the simple approach of applying the adjustment when a band has below 20 cases of moving directly to default in the whole sample. We found there is a threshold between Score3 and Score4 in the dataset where, in most of the credit limit bands, 4.7. Profit function 1 Since there are minor changes on the transition probability of moving to ordinal accounts, we do not present the results here. However, the complete data is available upon request. 129 M.M.C. So, L.C. Thomas / European Journal of Operational Research 212 (2011) 123–130 Table 4 Profit per state (in HK$). Value Limit1 Limit2 Limit3 Limit4 Limit5 Limit6 Limit7 Limit8 Limit9 Limit10 Closed Bad1 Bad2 3 + cycle Inactive Score1 Score2 Score3 Score4 Score5 Score6 Score7 Score8 Score9 Score10 8 4574 193 621 8 706 202 151 33 8 8 0 6 7 9 25 8022 62 647 6 1131 255 189 42 6 2 2 6 8 7 49 11085 286 1068 6 1571 367 282 80 18 8 3 3 5 6 95 16358 320 1040 6 2067 489 395 132 29 14 5 0 3 4 12 17845 164 891 2 2047 555 220 77 27 16 14 9 4 4 41 25578 355 825 1 2576 697 462 170 42 23 16 10 6 5 108 34028 924 1071 1 3316 894 599 235 60 32 24 17 11 8 33 41246 440 1890 4 3845 1210 704 296 78 39 36 24 21 17 69 47751 1070 642 6 5461 1351 915 429 123 66 61 44 35 26 302 88029 1387 248 6 10209 2052 1674 1126 311 194 142 102 85 70 chase which is paid by the retailer to the credit card company). On the other hand, the low behavioural score accounts are more likely to accumulate debt in their accounts. Thus these borrowers generated both interest and interchange fees for the lenders. A second observation from Table 4 is that usually profit increases with credit limit for scoreband 2 upwards. This is expected as the amount purchased goes up as the credit limit goes up and hence so does the merchant fee. 4.8. Optimal policy We implemented the value iteration algorithm (Puterman, 1994) to generate the optimal policies, and we used k = 0.995 (a rough estimate of 6% yearly inflation rate- which was the average rate in Hong Kong over this period) as our monthly discount value. Table 5 shows the optimal values when using the simple transition matrix with the corresponding credit limits in brackets. Similarly Table 6 shows the results of using the transition matrices whose values are adjusted for it being a low default portfolio. So for example, in Table 5, the cell in row 4 and column 5 corresponds to accounts with Score2 and Limit4. The optimal policy is to increase their credit limit to Limit5 and using the optimal policy subsequently leads to a total discounted reward of HK$32,781 per borrower. For 3 + cycle accounts, the optimal policy is to keep the credit limit unchanged. This follows because many of the accounts move to default at t + 1. Since the expected loss generated from these accounts increases with credit limit, in order to minimise the potential loss, the model chooses to keep their existing credit limit unchanged. The model suggests increasing the credit limit of some Table 5 Optimal policy (using the transition matrix with no adjustment on PDs). Policy Limit1 3 + cycle Inactive Score1 Score2 Score3 Score4 Score5 Score6 Score7 Score8 Score9 Score10 10,874 35,889 18,726 34,117 41,219 41,999 38,371 37,191 36,618 36,242 36,057 35,751 (1) (2) (1) (2) (10) (10) (10) (10) (2) (2) (2) (2) Limit2 Limit3 Limit4 Limit5 Limit6 Limit7 Limit8 Limit9 Limit10 9522 (2) 36,042 (2) 17,209 (2) 34,303 (2) 41,834 (10) 42,216 (10) 38,540 (10) 37,386 (10) 36,925 (2) 36,493 (2) 36,131 (2) 35,789 (2) 5745 (3) 35,810 (3) 13,135 (3) 34,296 (3) 42,101 (10) 42,314 (10) 38,626 (10) 37,340 (10) 36,878 (3) 36,353 (10) 36,027 (3) 35,614 (3) 8932 (4) 35,723 (4) 6775 (4) 32,781 (5) 42,348 (10) 42,426 (10) 38,614 (10) 37,340 (10) 36,810 (10) 36,363 (10) 35,966 (4) 35,498 (4) 5497 (5) 35,280 (6) 11,809 (5) 32,606 (5) 42,334 (10) 41,914 (10) 38,355 (10) 37,232 (10) 36,738 (10) 36,403 (10) 35,804 (10) 35,447 (10) 10,924 (6) 35,477 (6) 2383 (6) 32,117 (7) 42,469 (10) 42,275 (10) 38,474 (10) 37,219 (10) 36,773 (10) 36,374 (10) 35,858 (10) 35,452 (10) 12,820 (7) 35,015 (7) 578 (7) 32,606 (7) 42,651 (10) 42,369 (10) 38,546 (10) 37,273 (10) 36,732 (10) 36,370 (10) 35,831 (10) 35,428 (10) 14,385 (8) 34,887 (8) 8533 (8) 32,742 (8) 42,889 (10) 42,309 (10) 38,482 (10) 37,169 (10) 36,668 (10) 36,381 (10) 35,754 (10) 35,410 (10) 3856 (9) 34,000 (10) 16155 (9) 30,080 (9) 43,040 (10) 42,578 (10) 38,530 (10) 37,225 (10) 36,731 (10) 36,373 (10) 35,799 (10) 35,403 (10) 8693 (10) 33,977 (10) 54220 (10) 24,065 (10) 43,956 (10) 43,461 (10) 38,744 (10) 37,343 (10) 36,744 (10) 36,418 (10) 35,827 (10) 35,449 (10) Table 6 Optimal policy (using the transition matrix with adjusted PDs on LDPs). Policy Limit1 3 + cycle Inactive Score1 Score2 Score3 Score4 Score5 Score6 Score7 Score8 Score9 Score10 10,047 33,154 17,411 31,961 38,614 39,038 35,227 34,151 33,719 33,357 33,197 32,909 (1) (2) (1) (2) (10) (10) (10) (2) (2) (2) (2) (2) Limit2 Limit3 Limit4 Limit5 Limit6 Limit7 Limit8 Limit9 Limit10 8731 (2) 33,296 (2) 15,891 (2) 32,139 (2) 39,202 (10) 39,249 (10) 35,394 (10) 34,336 (2) 34,004 (2) 33,598 (2) 33,270 (2) 32,954 (2) 5062 (3) 33,017 (3) 11,957 (3) 32,134 (3) 39,462 (10) 39,347 (10) 35,473 (10) 34,159 (10) 33,877 (3) 33,292 (3) 33,059 (3) 32,666 (3) 8138 (4) 32,896 (4) 5753 (4) 30,681 (5) 39,704 (10) 39,459 (10) 35,457 (10) 34,149 (10) 33,728 (4) 33,205 (4) 32,928 (4) 32,488 (4) 4698 (5) 32,379 (6) 10,550 (5) 30,519 (5) 39,666 (10) 38,932 (10) 35,201 (10) 34,045 (10) 33,562 (10) 33,207 (10) 32,671 (6) 32,276 (6) 9976 (6) 32,567 (6) 1325 (6) 30,032 (7) 39,820 (10) 39,304 (10) 35,316 (10) 34,047 (10) 33,587 (10) 33,175 (10) 32,724 (6) 32,276 (6) 11,497 (7) 32,110 (7) 1615 (7) 30,512 (7) 40,000 (10) 39,396 (10) 35,373 (10) 34,079 (10) 33,543 (10) 33,166 (10) 32,623 (10) 32,229 (10) 12,893 (8) 32,008 (8) 9507 (8) 30,607 (8) 40,230 (10) 39,326 (10) 35,301 (10) 33,974 (10) 33,470 (10) 33,178 (10) 32,551 (10) 32,209 (10) 2937 (9) 31,192 (10) 17071 (9) 28,026 (9) 40,384 (10) 39,595 (10) 35,354 (10) 34,022 (10) 33,538 (10) 33,175 (10) 32,587 (10) 32,202 (10) 9310 (10) 31,170 (10) 54888 (10) 22,077 (10) 41,290 (10) 40,480 (10) 35,543 (10) 34,116 (10) 33,529 (10) 33,195 (10) 32,604 (10) 32,239 (10) 130 M.M.C. So, L.C. Thomas / European Journal of Operational Research 212 (2011) 123–130 Table 7 Comparing the real policy and model policy. % Increase compare with the ‘‘Real profit, real transition and real policy’’ 2. Model profit, real transition and real policy 3. Model profit, model transition and real policy 4. Model profit, model transition and model policy 15.5 27.3 34.9 Inactive accounts to give encouragement to these borrowers to start using the card. Since there is no history of repayment the optimal policy does not move these inactive accounts to the highest credit limit band. Notice some accounts in Score1 generate losses under certain credit limits since they are likely to default in the next month, and so the best policy is to keep their credit limit unchanged. The optimal policy suggests increasing the credit limit of accounts in Score3 to Score5 to the highest credit limit band, since these accounts have little risk and are very profitable. One surprise is that the optimal policies do not increase the credit limit of these with low limits and high behavioural scores, i.e. those in Table 6 with behavioural Score7 or above, and credit limit 2, 3 or 4. Instead, the optimal policy is to keep their credit limit unchanged. This is because they are less profitable then those with lower behavioural score. Also, high credit limits have a relatively more severe impact on their chance of moving to risky or even default states, than an account with lower behavioural score. 4.9. Comparison of Models with actual results Ten out-of-period sample sets (June 2006 to May 2007), which each consist of 1000 credit card accounts, are used to compare the performance of the model and the actual policy. We compare the following scenarios: 1. 2. 3. 4. Real profit, real transition matrix and real policy. Model profit, real transition matrix and real policy. Model profit, model transition matrix and real policy. Model profit, model transition matrix and model policy. The profit of each data set with respect to each scenario was computed. We then took the average profit of these data sets. The results are shown in Table 7.2 We see that the most significant change is when real rewards per periods are replaced by the model’s reward per period. This occurs because the variation in real rewards between customers in the same behavioural score state with the same credit limit is quite significant. The change between the model policy and the real policy when applied to the model suggests that the improvement in profitability using the model policy is around 7.5%. 5. Conclusion and future research This paper develops a MDP model to generate a dynamic credit limit policy. It explains how one can model the profitability of a 2 Due to confidential reason, we do not present the actual values. Also, due to the length of the paper, we do not present the results by account types. The results are available upon request. borrower including their default risk by employing a risk parameter that is used by almost every lender, namely the behavioural score. The paper has also incorporated a mechanism to deal with the risk that the MLE parameter estimation could give an incorrect chain structure if in the sample there are no transitions to default from some of the states. The model helps determine what should be the properties of the optimal credit limit policy, and the effectiveness of using this type of approach has been shown by applying it to a real credit card dataset. We have used behavioural score bands as our state space, but it may be possible to augment the state definition by using other characteristics of the borrower such as whether they are revolvers or transactors. This may be particularly relevant to improving the estimates of the rewards per month in each state, where there is the greatest difference between the model calculations and the real situation. We also could extent the model by including economic variables as drivers of the transition probabilities. However this simple model does show how one can use Markov chain modelling to derive improved credit limit policies. References Anderson, T.W., Goodman, L.A., 1957. Statistical inference about Markov chains. Annual Mathematics Statistics 28, 89–109. Benjamin, N., Cathcart, A., Ryan, K., 2006. Low default portfolios: A proposal for conservative estimation of default probabilities. Working paper, Financial Services Authority. Bierman, H., Hausman, W.H., 1970. The credit granting decision. Management Science 16, 519–532. Ching, W.K., Ng, M.K., Wong, K.K., Altman, E., 2004. Customer lifetime value: Stochastic optimization approach. Journal of Operational Research Society 55, 860–868. Dirickx, Y.M.I., Wakeman, L., 1976. An extension of the Bierman–Hausman model for credit granting. Management Science 22, 1229–1237. Frydman, H., Kallberg, J.G., Kao, D.L., 1985. Testing the adequacy of Markov chains and mover-stayer models as representations of credit behaviour. Operations Research 33, 1203–1214. Heyman, D.P., Sobel, M.I., 1982. Stochastic Models in Operations Research. Stochastic Processes and Operating Characteristics, vol. 1. McGraw-Hill, New York. Kijma, M., 1997. Markov Processes for Stochastic Modelling. Chapman & Hall, London. Matuszyk, A., Mues, C., Thomas, L.C., 2010. Modelling LGD for unsecured personal loans: Decision tree approach. Journal of the Operational Research Society 61, 393–398. Pluto, K., Tasche, D., 2006. The Basel II risk parameters. Estimating probabilities of default for low default portfolios. Springer. pp. 79–103. Puterman, M., 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York. Ross, S.M., 1983. Introduction to Stochastic Dynamic Programming. Academic Press, New York. Soman, D., Cheema, A., 2002. The effect of credit on spending decisions: The role of the credit limit and credibility. Marketing Science 21, 32–53. Thomas, L.C., 2009. Consumer Credit Models: Pricing, Profit and Portfolios. Oxford University Press, Oxford (Chapter 1). Thomas, L.C., Ho, J., Scherer, W.T., 2001. Time will tell: Behavioural scoring and the dynamics of consumer credit assessment. IMA Journal of Mathematics 12, 89– 103. Trench, M.S., Pederson, S.P., Lau, E.T., Ma, L., Wang, H., Nair, S.K., 2003. Managing credit lines and prices for bank one credit cards. Interfaces 33, 4–21. White, D.J., 1985. Real applications of Markov decision processes. Interfaces 15, 73– 78. White, D.J., 1988. Further real applications of Markov decision processes. Interfaces 18, 55–61. White, D.J., 1993. A survey of applications of Markov decision process. Journal of Operational Research Society 44, 1073–1096.