Modelling the profitability of credit cards by Markov decision processes ,

European Journal of Operational Research 212 (2011) 123–130
Contents lists available at ScienceDirect
European Journal of Operational Research
journal homepage: www.elsevier.com/locate/ejor
Stochastics and Statistics
Modelling the profitability of credit cards by Markov decision processes
Meko M.C. So ⇑, Lyn C. Thomas
School of Management, University of Southampton, Highfield, Southampton SO17 1BJ, United Kingdom
a r t i c l e
i n f o
Article history:
Received 20 May 2008
Accepted 14 January 2011
Available online 21 January 2011
Keywords:
OR in banking
Markov decision process
Credit card
Behavioural score
Profitability
Probability of default
a b s t r a c t
This paper derives a Markov decision process model for the profitability of credit cards, which allows
lenders to find an optimal dynamic credit limit policy. The states of the system are based on the borrower’s behavioural score and the decisions are what credit limit to give the borrower each period. In
determining which Markov chain best describes the borrower’s performance, second order as well as first
order Markov chains are considered and estimation procedures developed that deal with the low default
levels that may exist in the data. A case study is given in which the optimal credit limit is derived and the
results compared with the actual outcomes.
Ó 2011 Elsevier B.V. All rights reserved.
1. Introduction
Since the advent of credit cards in the 1960s, lenders have used
credit scoring, both application and behavioural scoring, to monitor and control default risk. However in the last decade the lenders’
objectives have changed from minimising default rates to maximising profit. Lenders have recognized that operating decisions
are crucial in determining how much profit is achieved from a card
and this paper focuses on the most important operating policy: The
management of the credit limit. Soman and Cheema (2002) conducted a study on the use of credit limit policies in encouraging
spending and found that the availability of additional credit does
promote card usage in some consumers.
Currently lenders set credit limits by subjectively determining
an appropriate value for each cell in a risk/return matrix. They decide on a credit limit for each combination of risk band and average
balance, where the average balance is considered to be a surrogate
for the return to the lender from that customer. This approach is
static in that it does no consider how the customer’s default risk
and the customer’s profitability will change over time. Nor is there
any optimization involved in deciding what credit limits to choose.
We propose using a Markov decision processes (MDP) to improve
the credit limit decision. A MDP model determines the optimal sequence of credit limit decisions by considering the evolution of a
customer’s behaviour over time. It calculates the profitability of a
credit card customer under the optimal dynamic credit limit policy.
⇑ Corresponding author. Tel.: +44 0 7796035843.
E-mail addresses: M.So@soton.ac.uk (M.M.C. So), L.Thomas@soton.ac.uk (L.C.
Thomas).
0377-2217/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.ejor.2011.01.023
Lenders keep a wealth of historical credit card data, including
customers’ behavioural scores each month. Behavioural scores
are a way of assessing customers’ default risk in the next year.
Building the Markov decision process model on behavioural scores
has the advantage that most lenders have been keeping this data
on customers for a number of years. With the advent of the Basel
Accord in 2008, lenders are required to keep such data for five
years and are encouraged to keep it through a whole economic
cycle.
MDPs have been used in a number of different contexts (Heyman and Sobel, 1982; Ross, 1983; White, 1985, 1988, 1993; Kijma,
1997). The first application of MDPs in consumer credit was by
Bierman and Hausman (1970) who looked at the repayment of a
loan where no further borrowing was allowed. The model assumed
the repayment of the customer followed a prior probability distribution. Using a Bayesian approach, the model revises the probability of repayment in the light of the collection history. Modifications
of the basic model were made both in the accounting rules (Dirickx
and Wakeman, 1976) and in the form of the Markov chain (Frydman et al., 1985). A MDP model to optimise consumer lifetime value was developed by Trench et al. (2003). Here the actions in the
MDP model were the consumer’s credit card limit or the interest
rate charged. So the actions in that paper are similar to the ones
in this paper. However their state space did not involve behavioural scores nor were they concerned with the problems of estimating the transition probabilities if there are low default rates.
Instead they used a six dimensional state space each dimension
having only two or three categories describing the recency and
frequency of purchases and payments. The authors developed
mechanisms for reducing the size of the transition matrix through
124
M.M.C. So, L.C. Thomas / European Journal of Operational Research 212 (2011) 123–130
merging states. Ching et al. (2004) used MDPs to manage the customer lifetime value generated from telecommunication customers, but again the state space used in that study involved
marketing factors not risk factors and the decision was whether
to implement promotions.
This paper is the first to use behavioural score bands as the basis
for MDP models. The advantage of basing the model on behavioural scores is considerable. Almost all lenders calculate such
scores every month for every borrower as a basis for their Basel
Accord probability of default calculations and as a way of segmenting the population by risk.
When modelling real problems using Markov decision processes, the curse of dimensionality (Puterman, 1994) can mean
the state space is very large and that one would need a large
amount of data to obtain robust estimates of the transition probabilities. Using behavioural scores helps to overcome this first difficulty because it itself is a ‘‘sufficient statistic’’ of the risk of the
account and already contains information from a number of different characteristics. Also by aggregating states one can obtain a simple but meaningful state space. In our case we make part of each
state an interval of behavioural scores and similarly combine possible credit limits into bands, which make up the other part of the
state. The actions are then which of these credit limit bands should
be applied to the borrower in the next month.
Acquiring enough data to calculate robust estimators of the
transition probabilities is not a problem in the consumer credit
context because of the size of the data sets available to lenders.
The only problem is that with some portfolios of loans, the number
of movements directly into default from some states is so low
(quite possibly zero) that the resultant zero transition probability
estimate may affect the structure of the Markov chain. This problem of estimating default probabilities in low default portfolios
was also highlighted in the Basel Accord. We therefore use an approach suggested in that context (Pluto and Tasche, 2006) and extended by the UK regulators (Benjamin et al., 2006) which ensures
the resulting Markov chain model is robust and conservative. The
conservativeness is reasonable as one would prefer the model to
underestimate rather than over estimate the profitability of a credit card account.
The main contribution of this paper is to show how one can use
Markov decision process models based on states consisting of
behavioural score bands – scores which most lenders calculate
on a monthly basis – to determine optimal credit limit policies.
Such policies maximise the expected profitability of each borrower
and allow for policy rules such as never decreasing the credit limit
which lenders may impose. The paper deals with some practical issues of modelling these transitions such as avoiding zero probability transition estimates which could affect the structure of the
Markov chain. The approach is applied to a case study of a Hong
Kong credit card portfolio of over 1.4 million accounts. The results
of the modelling are compared with what actually happened in the
portfolio.
The rest of the paper is organized as follows: Section 2 describes
the MDP model formulation. Section 3 discusses the estimation of
the transition probabilities including the probabilities of defaulting
immediately. Section 4 presents the practical issues in applying the
MDP model to the real credit card data while the results of the case
study are described in Section 5. The final section draws some conclusion on the model and the resultant case study.
parts-which behavioural score band the borrower is in and what
is the borrower’s current credit limit band. The state space thus
consists of the current credit limit band represent by L (indexed
by l = 0, 1, . . . , L) and the current behavioural score band I (indexed
by i = 0, 1, . . . , I). The latter is extended to allow absorbing states
corresponding to various types of default or closure of the account.
In our model the actions are limited to keeping the credit limit at
its current level or raising it to a higher limit band. This policy of
not decreasing credit limits is used by many lenders but the methodology we describe will not change if this restriction is dropped.
With this limitation the action set is defined as Al = {l0 : l 6 l0 }.
Two further elements need to be defined to complete the Markov decision process model. Let p(i0 jl, i) be the probability that if l is
the current customer’s credit limit band and the customer is in
behavioural score band i, then the next period the customer will
be in behavioural score band i0 . Secondly let r(l, i) be the profit obtained in the current period from a customer with credit limit l and
in behavioural score band i.
The objective is to maximise the discounted profit obtained
from the customer over the next t periods where the discount factor k describes the time value of money. This leads to the following
optimality equation for Vt(l, i), the maximum expected profit over
the next t periods that can be obtained from an account which is
currently in behaviour score band i, and with a credit limit of l:
(
V t ðl; iÞ ¼ max
rðl; iÞ þ
0
l 2Al
Consider a discrete state, discrete time discounted Markov decision process with decision epochs T (indexed by t = 1, 2, . . . , T) based
on a state space S. Each state in the state space consists of two
)
0
0
0
pði j l; iÞkV t1 ðl ; i Þ :
ð1Þ
i0
The right-hand-side of (1) corresponds to the profit over the next t
periods if we change the credit limit to l0 from l at the end of the current period for an account with behavioural score state i. We assume that a change in the credit limit takes effect in the next
time period since such a decision is usually included in the next
monthly balance statement sent to the customer. Removing this delay makes no difference to the methodology though of course the
optimality equation will be slightly different. The profit to the lender from the credit card during the current period is r(l, i). The
p(i0 jl, i) is the probability that the behavioural score changes to band
i0 . In that case, the profit on the remaining t 1 period is Vt1(l0 , i0 ).
The discount factor k is introduced because the profits in the
remaining t 1 periods actually occur one period after those used
in calculating Vt1(l0 , i0 ) since that assumes the t 1 periods start
now.
The optimality principle says that the decision l0 , which maximises the right hand side of (1) is the one to use when there are
t more periods to go if one wants to maximise the sum of the future
profits, when credit limits can only remain the same or be
increased.
3. Estimating the PDs of low default portfolios
Maximum likelihood estimators are used to estimate the transition probabilities of a Markov chain. In the Markov chain described
in Section 2, let nt(l, i) be the number of accounts in state (l, i) at
time t and let njt ðl; iÞ of them move to behavioural score state j at
time t + 1. Assuming the Markov chain is stationary means the
~ðj j l; iÞ for the probability p(jjl, i) is
maximum likelihood estimate p
PT1
j
t¼1 nt ðl; iÞ
PT1
t¼1 nt ðl; iÞ
2. The MDP model
X
:
In reality, moving directly to the default state is a rare event, particularly for high value (low risk) behavioural scores. There may be no
examples in the data of transaction from certain states (l, i) to the
default state D. Thus it is possible that p(Djl, i) may be very small
or even equal to zero. Putting such estimates into the MDP model
M.M.C. So, L.C. Thomas / European Journal of Operational Research 212 (2011) 123–130
leads to apparent ‘‘structural zeros’’ which change the connectedness of the dynamics in the state space. If the probability of defaulting from a given state is zero this can lead to unusual optimal
policies because the system wants to move to those apparently
‘‘safe’’ states. For example, suppose there are only three behavioural
states: Excellent, Good and Bad where Bad is the default state. If
there are two credit limit states, 10,000 and 50,000. (see Fig. 1)
Provided the discount value k is close to 1 and t is very large, the
optimal policy will have a credit limit of 10,000 in states 1 and 2
even though the profit per period r(l, i) is much higher when the
credit limit is 50,000 than when it is l = 10,000 (10 as opposed to
1 say). This is because there is apparently no chance of default if
the credit limit is 10,000 while there is a very small chance of default when the credit limit is l = 50,000. This anomaly arises because the maximum likelihood estimates are p(Dj10,000, i) = 0
and p(Dj50,000, i) > 0, even though these estimates are based on
only 800 cases when the credit limit is 10,000, but 8000 cases
when the credit limit is 50,000.
One way to overcome this problem is to take a conservative
estimate of the default probabilities, rather than the maximum
likelihood estimate. This problem has been extensively discussed
in the context of the Basel Accord where again bank regulators
and lenders have been considering the robustness of estimates of
default probabilities in low default portfolios. We follow the approach introduced by Pluto and Tasche (2006) and extended by
Benjamin et al. (2006). Firstly we assume the transitions to default
are monotonically decreasing as the behavioural score increases
and so if the score bands are labelled with I being the highest quality, then
pðDjl; IÞ 6 pðDjl; I 1Þ 6 6 pðDjl; 2Þ 6 pðDjl; 1Þ:
So a conservative assumption for state i would be that for some J,
J6i
125
The second conservative assumption in this approach is not to
use the MLE estimate of the default probability, but rather take
the lower confidence limit of the default probability. Ifs there are
D(l, i) accounts defaulting in the next period from the N(l, i)
accounts under consideration, it is assumed this is given by a Binomial distribution Bin(N(l, i), p). One chooses p to be the highest
probability of default, so that the corresponding lower a-confidence limit is exactly D(l, i), i.e. getting D(l, i) or fewer defaults
has no more than a 1 a/2 probability of occurring. Since the
mean and variance of the Binomial distribution is N(l, i)p and
N(l, i)p(1 p), one chooses p to be the value where
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Nðl; iÞp U1 ð1 a=2Þ Nðl; iÞpð1 pÞ ¼ Dðl; iÞ;
where U1(1 a/2) is the inverse cumulative standard normal distribution, i.e. the number of standard deviations below the mean in
a standard normal distribution so that the chance of getting a lower
value is (1 a/2).
^D ðl; iÞ in this way for these states (l, i)
One chooses the estimate p
PT D
where the number of actual defaults
t¼1 nt ðl; iÞ is at or below
same agreed value – which might be zero. One would like to use
MLE to obtain the estimates of the other transition probabilities
^j ðl; iÞ from state (l, i). However this would result in the sum of
p
the transition probabilities being greater than 1 and so instead
one defines the transition probability estimates as aðl; iÞ b
p ðj!l; iÞ
where
^D ðl; iÞ
1p
:
bj
j–D p ðl; iÞ
aðl; iÞ ¼ P
P
For states (l, i) where the number of defaults Tt¼1 nDt ðl; iÞ exceed the
low default bound, MLE is used to estimate all the transition
probabilities.
pðDjl; iÞ ¼ pðDjl; i 1Þ ¼ ¼ pðDjl; J þ 1Þ ¼ pðDjl; JÞ;
4. Apply MDP model to credit card data
where J is the most risky of the states with low number of cases
going directly into default. This means that to estimate p(Djl, i),
P Pi
we take a sample of Nðl; iÞ ¼ T1
t¼1
k¼J nt ðl; kÞ borrower/months of
customers who are in state i or worse (down to state J) and
PT1 Pi
Dðl; iÞ ¼ t¼1 k¼J nDt ðl; kÞ of these default in the next month.
4.1. Sampling and data preparation
The MDP model was applied to credit card data from a major
Hong Kong bank. The dataset consisted of the credit card histories
and characteristics of over 1,400,000 credit accounts for each of
Fig. 1. An unexpected optimal policy.
126
M.M.C. So, L.C. Thomas / European Journal of Operational Research 212 (2011) 123–130
60 months. The fields used in this study were account balance, account repayment, credit limit, account written-off record and
behavioural score.
In each monthly dataset, we extracted a random sample of
50,000 accounts that included new, existing, inactive, default and
closed accounts over the sampling period. In the following discussion, these extracted accounts are indexed by time t. We then
looked at how these accounts performed in the next month including the change in behavioural score. There were 3,000,000 cases
(50,000 60 months) extracted from the dataset. We deleted some
of the ambiguous, missing and special accounts that we are not
interested in covering in this study, which accounted for less than
0.2%. We eventually used 2,994,602 transitions for the analysis.
One possible concern with the extracted sample is whether the
existing credit limit policy was affecting the data. However, looking at the sample, only 8% of the 3 million transitions had credit
limit adjustment in the previous three months and most of these
were to Inactive accounts. Thus we felt the existing policy would
not have a major impact on the data.
4.2. Special accounts
Each month an account is given a behavioural score or put in a
special state, such as closed, inactive, 3 + cycle or defaulted. A
closed account is one where the credit card was terminated with
zero account balance. A credit card account which has never been
activated, was newly opened in the last two months (and so does
not have enough data to merit a behavioural score) or has not been
used in the last twelve months is called Inactive. A 3 + cycle account is one which the credit card account has arrears of three
months or more, but the lender has not yet written off the account.
The most important special account states in the data are the
defaulted (or written off) accounts. There are four possible reasons
to write-off an account, including bankruptcy, charge-off, revoked
and 3 + cycle delinquent. Bankruptcy is when a borrower is declared bankrupt by a court; charge-off is when the lender does
not believe the debt can now be recovered from the borrower;
and revoked is when the credit card is stopped because of illegal
behaviour by the borrower, such as being over the credit limit persistently. Lenders pass a written off account to the debt collection
department to follow-up, and in time the account may repay all,
part or none of the outstanding debt. Even when the account
makes full repayment, the duration of the collection process could
be of several years. It is important to estimate the average future
repayment amount of different default accounts because it changes
the values of the profit function in (1).
There is little research (Matuszyk et al., 2010) on estimating the
loss given default of revolving credit products. So we use a simple
approach to compare the debt repayment ratios. Defined
tþ24
R Btþ1BB
where Bt+1 is the current balance of an account at detþ1
fault in month t. The repayment ratio R thus is defined as the proportion of the debt repaid to the lender after two years. For
confidentiality reasons, we could not show the exact repayment ratio, but the results showed one of the forms of default had a high
repayment ratio which was significantly different from the others.
We call this default account state, Bad2 and group the rest of the
three forms of written off together into one default state, called
Bad1. So in our cost function the loss generated by accounts in
Bad1 is higher than those of Bad2.
4.3. Coarse-classifying
Since a behavioural score typically has several hundred values,
it is sensible to split the score into a number of bands to reduce the
size of the state space. We aim to find suitable splits so that the
Markovian assumption holds as nearly as possible. To check
whether the Markov chain satisfies this assumption, we are interested in whether for each state the hypothesis that the probability
of moving from st = (lt, it) to it+1 is independent of the state at t 1,
i.e. st1 = (lt1, it1).
Define nt(lt1, it1; lt, it; it+1) to be the number of times that a
credit account was in state (lt1, it1) at time t 1 followed by moving to (lt, it) at time t and it+1 at time t + 1. Similarly define
nt(lt1, it1 ; lt, it) to be the number of times that a customer’s behavioural score was in state (lt1, it1) at time t 1 and then moves to
behaviour score state (lt, it) at time t. We assume the chain is stationary, thus the estimator for p(it+1jlt1, it1, lt, it) is:
PT2
^ðitþ1 jlt1 ; it1 ; lt ; it Þ ¼
p
t¼0 nt ðlt1 ; it1 ; lt ; it ; itþ1 Þ
:
PT2
t¼0 nt ðlt1 ; it1 ; lt ; it Þ
The Markovity of the chain corresponds to the hypothesis that
pðitþ1 j1; 1; lt ; it Þ ¼ pðitþ1 j2; 1; lt ; it Þ ¼ ¼ pðitþ1 jL; 1; lt ; it Þ
¼ pðitþ1 j1; 2; lt ; it Þ ¼ ¼ pðitþ1 jL; I; lt ; it Þ
for lt, it, it+1. To check on the Markovity of state (lt, it), we use the chisquare test (Anderson and Goodman, 1957). Let
X
v2ðlt ;it Þ ¼
X
ðlt1 ;it1 Þ2S itþ1 2I
^ðitþ1 jlt1 ; it1 ; lt ; it Þ p
^ðitþ1 jlt ; it Þ2
n ðlt1 ; it1 ; lt ; it Þ½p
;
^ðitþ1 jlt ; it Þ
p
ð2Þ
where
PT1
^ðitþ1 jlt ; it Þ ¼
p
t¼1 nt ðlt ; it ; itþ1 Þ
PT1
t¼0 nt ðlt ; it Þ
and
n ðlt1 ; it1 ; lt ; it Þ ¼
T 1
X
nt ðlt1 ; it1 ; lt ; it Þ:
t¼1
Anderson and Goodman (1957) showed that (2) has a chi-square
distribution with (I 1)(L 1)2 degree of freedom.
Our methodology for choosing these splits is similar to that
used in coarse classifying characteristics in credit scorecards
(Chapter 1, Thomas, 2009); that is splitting a continuous variable
into bands, each of which has different bad rates. Here we are splitting the behavioural score into bands, which have different subsequent dynamics (Thomas et al., 2001). A traditional approach is to
start with a fine classification i.e. with more bands then one really
wants and then check if one can combine adjacent bands. Alternatively, one can use the classification tree approach of finding the
best split into two classes and then splitting one of these into
two more until it is not worth splitting further. That is the approach we used in the algorithm. We repeated the process in both
behavioural score and credit limit to come up with 15 score bands,
of which 10 were behavioural score band and 5 were special states;
we also ended up with 11 credit limit bands as listed in Table 1. For
confidentiality reasons, we do not disclose the precise behavioural
score bands. We will use Score1 to Score10 to represent the credit
score where Score1 are those with lowest behavioural score and
Score10 are those with highest. Similarly, we use Limit1 to Limit10
to represent the credit limits with Limit1 being the lowest credit
limit band. Table 1 gives the distribution of the borrowers into
the different behavioural score bands and the various credit limit
ranges in a typical month.
4.4. Choice of order
A MDP is mth-order if the transition probabilities depend on
which state the system is currently in and what state it was in
127
M.M.C. So, L.C. Thomas / European Journal of Operational Research 212 (2011) 123–130
Table 1
Distribution of behavioural score and credit limit bands.
Index
Definition
Per. (%)
Behavioural score band
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Closed
Bad1
Bad2
3 + cycle delinquent
Inactive
Score1
Score2
Score3
Score4
Score5
Score6
Score7
Score8
Score9
Score10
0.54
0.09
0.05
0.02
6.16
0.11
1.68
11.29
19.30
10.65
5.32
6.07
11.20
10.14
17.39
Credit limit band
0
1
2
3
4
5
6
7
8
9
10
Closed
Limit1
Limit2
Limit3
Limit4
Limit5
Limit6
Limit7
Limit8
Limit9
Limit10
–
7.21
10.58
11.67
10.55
9.35
10.46
10.02
10.21
9.36
10.60
for the previous m 1 periods. For a first order Markov chain the
transition probability depends only on the current state where
for a mth-order Markov chain the transition at time t depends on
the states (it, it1, . . . , it+1m) that it occupied for the last m time
periods. So, the number of states increases exponentially in m as
there are jSjm states in a mth-order MDP. To test whether a chain
satisfies the mth-order Markovity assumption, one can again apply
the chi square test (Anderson and Goodman, 1957). Test results
indicated the Markov chain was not first order. In reality, almost
all applications fail to satisfy the first-order Markovity assumption,
because with so much data, one can usually improve the fit beyond
what are narrow significance limits. What is more important is
whether there is a significant improvement in the fit, when one
uses second or third order Markov chains. So, we tested whether
the process is second-order Markov i.e. we redefined the state so
that it carried the history of t 2 and t 1. Although there was
an improvement on the chi-square value, the hypothesis that the
chain was a second-order MDP was also not justified. Using an
even higher order Markov chain increases the size of the state
space exponentially and so will affect the robustness of the model.
The question is whether the improvement in fit justifies the increase in size of the model and the consequent increase in computational time to solve the problem. In higher order Markov chains,
the estimates of the transition probabilities are based on less data
in each cell and so are less robust. What is important is that the
model is robust over future time periods not just that it is a good
fit for past data. For these reasons and because the improvement
in fit when going from first to second order was not much greater
(in terms of log likelihood) then going from second to third order it
was decided to use the simple first order model.
4.5. Transition matrices
Tables 2a and 2b show the transition matrices in the dataset.
Each entry represents the transition percentage from the state of
the account at time t to the state of the account at time t + 1. ‘‘–’’
represents there was no observation in the sampling period; ‘‘0’’
that the transition probability was less than 0.0005. The number
of accounts in different states at time t are given in the last column
of the table. As the transition matrix is enormous, we only presented those of Limit1 and Limit10 for illustration. Also, we show
the results by splitting the states into two groups- ordinary states
are those with Score2 or above; Closed, Bad1, Bad2, 3 + cycle, Inactive and Score1 are classified as special states.
Table 2a shows the transition of moving to ordinary states. The
matrix is dominated by the diagonal entries as one expects, and the
volatility of score transitions decreases as the credit limit increase.
88.6% of accounts with Score10 and Limit10, remain in the same
state after one month whereas only 75.2% of accounts in Score10
but with Limit1 stay in the Score 10 band. This is repeated in most
high behavioural score groups whereas for low score groups, the
account movements are closer for different credit limits.
Table 2a
Transition matrix – moving to behavioural score states.
Score2
Score3
Score4
Score5
Score6
Score7
Score8
Score9
Score10
Total
Limit 1
3 + cycle
Inactive
Score1
Score2
Score3
Score4
Score5
Score6
Score7
Score8
Score9
Score10
18.3
0.1
48.4
66.1
7.6
0.3
0.1
0.1
0
0
–
0
15.4
0.5
3.1
25.3
76.8
7.7
2.4
1.8
1.3
0.6
0.4
0.3
–
3.5
–
0.3
13.2
77.1
13.4
7.3
4.7
3.1
3.9
1.6
–
2.4
–
–
0.4
8.9
60.2
15.5
8.3
2.9
1.8
0.7
–
1
–
–
0.2
1.4
6.5
48.7
6
6.9
1.3
0.9
–
0.6
–
–
0
1.5
5.7
6.8
57
9.4
6.2
1.5
–
0.9
–
–
0
1.3
5.7
12.7
10
57.3
14.4
9.6
–
0.3
–
–
0
0.1
2.3
2.2
6.9
9.3
58
9.9
–
3
–
–
–
0.1
2.2
3.3
3.3
9
13
75.2
104
44,616
577
8301
27,953
42,024
19,650
9159
11,846
18,163
12,255
21,328
Limit 10
3 + cycle
Inactive
Score1
Score2
Score3
Score4
Score5
Score6
Score7
Score8
Score9
Score10
13.3
–
24.2
58.8
2.4
0.1
0
0
0
0
0
–
13.3
0.2
3.1
27.7
77.4
8.1
1.5
0.7
0.7
0.5
0.3
0.1
–
2.1
1.2
1.6
18.2
78.5
11.7
4.7
3.4
2.4
1.1
0.9
–
2.5
–
0.2
0.5
8.8
59.5
10.4
7
3.7
1.5
0.3
–
1.1
–
–
0.1
1.4
7.7
58.9
6.9
4.4
0.9
0.4
–
0.7
–
–
0
0.9
5.6
8.2
55.2
6.3
4.5
0.5
–
1.5
–
–
0
1.3
7.4
9.1
10.4
62.2
7.3
4.4
–
0.8
–
–
0
0.2
4.4
3.3
10.2
9.7
73
4.7
–
2.2
–
–
0
0.1
1.6
3.3
4.9
10.1
10.9
88.6
15
25,885
161
1799
28,656
59,188
26,563
14,879
17,050
32,561
37,615
72,939
128
M.M.C. So, L.C. Thomas / European Journal of Operational Research 212 (2011) 123–130
Table 2b
Transition matrix – moving to special states.
Closed
Bad1
Bad2
3 + cycle
Inactive
Score1
Total
Limit 1
3 + cycle
Inactive
Score1
Score2
Score3
Score4
Score5
Score6
Score7
Score8
Score9
Score10
17.31
0.79
1.73
1.07
0.97
1.08
1.1
1.19
1.18
0.91
0.72
0.33
33.65
0
22.53
1.07
0.97
1.08
1.1
1.19
1.18
0.91
0.72
0.33
11.54
0
7.11
1.06
0.18
0.04
0.04
0.05
0.03
0.01
0.01
–
2.88
–
5.72
0.48
0.03
–
–
–
–
–
–
–
–
86.96
–
–
0.12
0.39
0.34
0.45
1.17
0.5
0.23
0.06
0.96
–
11.44
4.36
0.34
–
–
–
–
–
–
–
104
44,616
577
8301
27,953
42,024
19,650
9159
11,846
18,163
1225
21,328
Limit 10
3 + cycle
Inactive
Score1
Score2
Score3
Score4
Score5
Score6
Score7
Score8
Score9
Score10
46.67
0.89
3.73
1.56
0.42
0.36
0.38
0.58
0.57
0.29
0.3
0.15
20
–
52.8
2.56
0.19
0.03
0
–
–
–
–
–
6.67
–
4.35
1.72
0.39
0.11
0.03
0.01
0.01
0.01
–
0
–
–
62
1.22
0
–
–
–
–
–
–
–
–
88.02
–
–
0.1
0.14
0.28
0.74
0.66
0.34
0.22
0.03
–
–
9.94
4.72
0.23
0
–
–
–
0
–
–
15
25,885
161
1799
28,656
59,188
26,563
14,879
17,050
32,561
37,615
72,939
Table 3
Transition matrix of the adjusted PDs on LDPs – moving to special states.
Closed
Bad1
Bad2
3 + cycle
Inactive
Score1
Total
Limit 1
Score4
Score5
Score6
Score7
Score8
Score9
Score10
1.08
1.1
1.19
1.18
0.91
0.72
0.33
0.03
0.02
0.02
0.02
0.02
0.02
0.01
0.06
0.06
0.06
0.05
0.05
0.04
0.04
–
–
–
–
–
–
–
0.39
0.34
0.45
1.17
0.5
0.23
0.06
–
–
–
–
–
–
–
42,024
19,650
9159
11,846
18,163
1225
21,328
Limit 10
Score4
Score5
Score6
Score7
Score8
Score9
Score10
0.36
0.38
0.58
0.57
0.29
0.3
0.15
0.02
0.01
0.01
0.01
0.01
0.01
0.01
0.04
0.03
0.03
0.02
0.02
0.02
0.01
–
–
–
–
–
–
–
0.14
0.28
0.74
0.66
0.34
0.22
0.03
0
–
–
–
0
–
–
59,188
26,563
14,879
17,050
32,561
37,615
72,939
Table 2b shows the transitions of moving to special accounts.
For Limit 10, there is no example of an account with Score 6 or
above moving to the Bad 1 state. However 52.8% of those in Score1,
with credit Limit10, move to Bad1 status the next month, which is
a much higher percentage then those with credit limits in the Limit1 band. This may be due to the higher balance that such borrowers are carrying. On the other hand, as Table 2a shows 48.4% of
these in Score1 and Limit1 improve their scores to Score2 in the
next month.
the number of movements to defaults in Score4 or higher score
bands is below 20. We thus defined the accounts in Score4 to
Score10 as making up the low default portfolio and the corresponding adjusted transition probabilities are presented in Table
3.1 In this case, there are only small changes in the transition probabilities to this adjustment. However, for other data sets, the differences could be more significant.
4.6. Transition of LDP
Table 4 shows the profit function used in the MDP model which
was estimated by using the monthly profit field. For each state (l, i),
the average monthly profit of the borrowers was taken over the
cases when the borrower is in behavioural state i with credit limit
l being applied. In general the profit is decreasing with behavioural
score, because a borrower with high behavioural score is more
likely to make full repayments. Thus the lenders can only gain
the merchandise fee from them (the fee on each credit card pur-
To prevent introducing invalid structural zeros into the transition matrix, we adjust the transition matrix of this low default
portfolio by the method presented in Section 3. It is important to
determine when such an adjustment needs to be made. Benjamin
et al. (2006) tested several approaches to determining when this
adjustment is useful. They concluded that there is no ideal method
but suggested the simple approach of applying the adjustment
when a band has below 20 cases of moving directly to default in
the whole sample. We found there is a threshold between Score3
and Score4 in the dataset where, in most of the credit limit bands,
4.7. Profit function
1
Since there are minor changes on the transition probability of moving to ordinal
accounts, we do not present the results here. However, the complete data is available
upon request.
129
M.M.C. So, L.C. Thomas / European Journal of Operational Research 212 (2011) 123–130
Table 4
Profit per state (in HK$).
Value
Limit1
Limit2
Limit3
Limit4
Limit5
Limit6
Limit7
Limit8
Limit9
Limit10
Closed
Bad1
Bad2
3 + cycle
Inactive
Score1
Score2
Score3
Score4
Score5
Score6
Score7
Score8
Score9
Score10
8
4574
193
621
8
706
202
151
33
8
8
0
6
7
9
25
8022
62
647
6
1131
255
189
42
6
2
2
6
8
7
49
11085
286
1068
6
1571
367
282
80
18
8
3
3
5
6
95
16358
320
1040
6
2067
489
395
132
29
14
5
0
3
4
12
17845
164
891
2
2047
555
220
77
27
16
14
9
4
4
41
25578
355
825
1
2576
697
462
170
42
23
16
10
6
5
108
34028
924
1071
1
3316
894
599
235
60
32
24
17
11
8
33
41246
440
1890
4
3845
1210
704
296
78
39
36
24
21
17
69
47751
1070
642
6
5461
1351
915
429
123
66
61
44
35
26
302
88029
1387
248
6
10209
2052
1674
1126
311
194
142
102
85
70
chase which is paid by the retailer to the credit card company). On
the other hand, the low behavioural score accounts are more likely
to accumulate debt in their accounts. Thus these borrowers generated both interest and interchange fees for the lenders. A second
observation from Table 4 is that usually profit increases with credit
limit for scoreband 2 upwards. This is expected as the amount purchased goes up as the credit limit goes up and hence so does the
merchant fee.
4.8. Optimal policy
We implemented the value iteration algorithm (Puterman,
1994) to generate the optimal policies, and we used k = 0.995 (a
rough estimate of 6% yearly inflation rate- which was the average
rate in Hong Kong over this period) as our monthly discount value.
Table 5 shows the optimal values when using the simple transition matrix with the corresponding credit limits in brackets. Similarly Table 6 shows the results of using the transition matrices
whose values are adjusted for it being a low default portfolio. So
for example, in Table 5, the cell in row 4 and column 5 corresponds
to accounts with Score2 and Limit4. The optimal policy is to increase their credit limit to Limit5 and using the optimal policy subsequently leads to a total discounted reward of HK$32,781 per
borrower.
For 3 + cycle accounts, the optimal policy is to keep the credit
limit unchanged. This follows because many of the accounts move
to default at t + 1. Since the expected loss generated from these accounts increases with credit limit, in order to minimise the potential loss, the model chooses to keep their existing credit limit
unchanged. The model suggests increasing the credit limit of some
Table 5
Optimal policy (using the transition matrix with no adjustment on PDs).
Policy
Limit1
3 + cycle
Inactive
Score1
Score2
Score3
Score4
Score5
Score6
Score7
Score8
Score9
Score10
10,874
35,889
18,726
34,117
41,219
41,999
38,371
37,191
36,618
36,242
36,057
35,751
(1)
(2)
(1)
(2)
(10)
(10)
(10)
(10)
(2)
(2)
(2)
(2)
Limit2
Limit3
Limit4
Limit5
Limit6
Limit7
Limit8
Limit9
Limit10
9522 (2)
36,042 (2)
17,209 (2)
34,303 (2)
41,834 (10)
42,216 (10)
38,540 (10)
37,386 (10)
36,925 (2)
36,493 (2)
36,131 (2)
35,789 (2)
5745 (3)
35,810 (3)
13,135 (3)
34,296 (3)
42,101 (10)
42,314 (10)
38,626 (10)
37,340 (10)
36,878 (3)
36,353 (10)
36,027 (3)
35,614 (3)
8932 (4)
35,723 (4)
6775 (4)
32,781 (5)
42,348 (10)
42,426 (10)
38,614 (10)
37,340 (10)
36,810 (10)
36,363 (10)
35,966 (4)
35,498 (4)
5497 (5)
35,280 (6)
11,809 (5)
32,606 (5)
42,334 (10)
41,914 (10)
38,355 (10)
37,232 (10)
36,738 (10)
36,403 (10)
35,804 (10)
35,447 (10)
10,924 (6)
35,477 (6)
2383 (6)
32,117 (7)
42,469 (10)
42,275 (10)
38,474 (10)
37,219 (10)
36,773 (10)
36,374 (10)
35,858 (10)
35,452 (10)
12,820 (7)
35,015 (7)
578 (7)
32,606 (7)
42,651 (10)
42,369 (10)
38,546 (10)
37,273 (10)
36,732 (10)
36,370 (10)
35,831 (10)
35,428 (10)
14,385 (8)
34,887 (8)
8533 (8)
32,742 (8)
42,889 (10)
42,309 (10)
38,482 (10)
37,169 (10)
36,668 (10)
36,381 (10)
35,754 (10)
35,410 (10)
3856 (9)
34,000 (10)
16155 (9)
30,080 (9)
43,040 (10)
42,578 (10)
38,530 (10)
37,225 (10)
36,731 (10)
36,373 (10)
35,799 (10)
35,403 (10)
8693 (10)
33,977 (10)
54220 (10)
24,065 (10)
43,956 (10)
43,461 (10)
38,744 (10)
37,343 (10)
36,744 (10)
36,418 (10)
35,827 (10)
35,449 (10)
Table 6
Optimal policy (using the transition matrix with adjusted PDs on LDPs).
Policy
Limit1
3 + cycle
Inactive
Score1
Score2
Score3
Score4
Score5
Score6
Score7
Score8
Score9
Score10
10,047
33,154
17,411
31,961
38,614
39,038
35,227
34,151
33,719
33,357
33,197
32,909
(1)
(2)
(1)
(2)
(10)
(10)
(10)
(2)
(2)
(2)
(2)
(2)
Limit2
Limit3
Limit4
Limit5
Limit6
Limit7
Limit8
Limit9
Limit10
8731 (2)
33,296 (2)
15,891 (2)
32,139 (2)
39,202 (10)
39,249 (10)
35,394 (10)
34,336 (2)
34,004 (2)
33,598 (2)
33,270 (2)
32,954 (2)
5062 (3)
33,017 (3)
11,957 (3)
32,134 (3)
39,462 (10)
39,347 (10)
35,473 (10)
34,159 (10)
33,877 (3)
33,292 (3)
33,059 (3)
32,666 (3)
8138 (4)
32,896 (4)
5753 (4)
30,681 (5)
39,704 (10)
39,459 (10)
35,457 (10)
34,149 (10)
33,728 (4)
33,205 (4)
32,928 (4)
32,488 (4)
4698 (5)
32,379 (6)
10,550 (5)
30,519 (5)
39,666 (10)
38,932 (10)
35,201 (10)
34,045 (10)
33,562 (10)
33,207 (10)
32,671 (6)
32,276 (6)
9976 (6)
32,567 (6)
1325 (6)
30,032 (7)
39,820 (10)
39,304 (10)
35,316 (10)
34,047 (10)
33,587 (10)
33,175 (10)
32,724 (6)
32,276 (6)
11,497 (7)
32,110 (7)
1615 (7)
30,512 (7)
40,000 (10)
39,396 (10)
35,373 (10)
34,079 (10)
33,543 (10)
33,166 (10)
32,623 (10)
32,229 (10)
12,893 (8)
32,008 (8)
9507 (8)
30,607 (8)
40,230 (10)
39,326 (10)
35,301 (10)
33,974 (10)
33,470 (10)
33,178 (10)
32,551 (10)
32,209 (10)
2937 (9)
31,192 (10)
17071 (9)
28,026 (9)
40,384 (10)
39,595 (10)
35,354 (10)
34,022 (10)
33,538 (10)
33,175 (10)
32,587 (10)
32,202 (10)
9310 (10)
31,170 (10)
54888 (10)
22,077 (10)
41,290 (10)
40,480 (10)
35,543 (10)
34,116 (10)
33,529 (10)
33,195 (10)
32,604 (10)
32,239 (10)
130
M.M.C. So, L.C. Thomas / European Journal of Operational Research 212 (2011) 123–130
Table 7
Comparing the real policy and model policy.
% Increase compare with
the ‘‘Real profit, real
transition and real policy’’
2. Model profit, real transition and real policy
3. Model profit, model transition and real policy
4. Model profit, model transition and model
policy
15.5
27.3
34.9
Inactive accounts to give encouragement to these borrowers to
start using the card. Since there is no history of repayment the
optimal policy does not move these inactive accounts to the highest credit limit band. Notice some accounts in Score1 generate
losses under certain credit limits since they are likely to default
in the next month, and so the best policy is to keep their credit limit unchanged.
The optimal policy suggests increasing the credit limit of accounts in Score3 to Score5 to the highest credit limit band, since
these accounts have little risk and are very profitable.
One surprise is that the optimal policies do not increase the
credit limit of these with low limits and high behavioural scores,
i.e. those in Table 6 with behavioural Score7 or above, and credit
limit 2, 3 or 4. Instead, the optimal policy is to keep their credit
limit unchanged. This is because they are less profitable then those
with lower behavioural score. Also, high credit limits have a relatively more severe impact on their chance of moving to risky or
even default states, than an account with lower behavioural score.
4.9. Comparison of Models with actual results
Ten out-of-period sample sets (June 2006 to May 2007), which
each consist of 1000 credit card accounts, are used to compare the
performance of the model and the actual policy. We compare the
following scenarios:
1.
2.
3.
4.
Real profit, real transition matrix and real policy.
Model profit, real transition matrix and real policy.
Model profit, model transition matrix and real policy.
Model profit, model transition matrix and model policy.
The profit of each data set with respect to each scenario was
computed. We then took the average profit of these data sets.
The results are shown in Table 7.2 We see that the most significant
change is when real rewards per periods are replaced by the model’s
reward per period. This occurs because the variation in real rewards
between customers in the same behavioural score state with the
same credit limit is quite significant. The change between the model
policy and the real policy when applied to the model suggests that
the improvement in profitability using the model policy is around
7.5%.
5. Conclusion and future research
This paper develops a MDP model to generate a dynamic credit
limit policy. It explains how one can model the profitability of a
2
Due to confidential reason, we do not present the actual values. Also, due to the
length of the paper, we do not present the results by account types. The results are
available upon request.
borrower including their default risk by employing a risk parameter that is used by almost every lender, namely the behavioural
score. The paper has also incorporated a mechanism to deal with
the risk that the MLE parameter estimation could give an incorrect
chain structure if in the sample there are no transitions to default
from some of the states. The model helps determine what should
be the properties of the optimal credit limit policy, and the effectiveness of using this type of approach has been shown by applying
it to a real credit card dataset.
We have used behavioural score bands as our state space, but it
may be possible to augment the state definition by using other
characteristics of the borrower such as whether they are revolvers
or transactors. This may be particularly relevant to improving the
estimates of the rewards per month in each state, where there is
the greatest difference between the model calculations and the real
situation. We also could extent the model by including economic
variables as drivers of the transition probabilities. However this
simple model does show how one can use Markov chain modelling
to derive improved credit limit policies.
References
Anderson, T.W., Goodman, L.A., 1957. Statistical inference about Markov chains.
Annual Mathematics Statistics 28, 89–109.
Benjamin, N., Cathcart, A., Ryan, K., 2006. Low default portfolios: A proposal for
conservative estimation of default probabilities. Working paper, Financial
Services Authority.
Bierman, H., Hausman, W.H., 1970. The credit granting decision. Management
Science 16, 519–532.
Ching, W.K., Ng, M.K., Wong, K.K., Altman, E., 2004. Customer lifetime value:
Stochastic optimization approach. Journal of Operational Research Society 55,
860–868.
Dirickx, Y.M.I., Wakeman, L., 1976. An extension of the Bierman–Hausman model
for credit granting. Management Science 22, 1229–1237.
Frydman, H., Kallberg, J.G., Kao, D.L., 1985. Testing the adequacy of Markov chains
and mover-stayer models as representations of credit behaviour. Operations
Research 33, 1203–1214.
Heyman, D.P., Sobel, M.I., 1982. Stochastic Models in Operations Research.
Stochastic Processes and Operating Characteristics, vol. 1. McGraw-Hill, New
York.
Kijma, M., 1997. Markov Processes for Stochastic Modelling. Chapman & Hall,
London.
Matuszyk, A., Mues, C., Thomas, L.C., 2010. Modelling LGD for unsecured personal
loans: Decision tree approach. Journal of the Operational Research Society 61,
393–398.
Pluto, K., Tasche, D., 2006. The Basel II risk parameters. Estimating probabilities of
default for low default portfolios. Springer. pp. 79–103.
Puterman, M., 1994. Markov Decision Processes: Discrete Stochastic Dynamic
Programming. John Wiley, New York.
Ross, S.M., 1983. Introduction to Stochastic Dynamic Programming. Academic Press,
New York.
Soman, D., Cheema, A., 2002. The effect of credit on spending decisions: The role of
the credit limit and credibility. Marketing Science 21, 32–53.
Thomas, L.C., 2009. Consumer Credit Models: Pricing, Profit and Portfolios. Oxford
University Press, Oxford (Chapter 1).
Thomas, L.C., Ho, J., Scherer, W.T., 2001. Time will tell: Behavioural scoring and the
dynamics of consumer credit assessment. IMA Journal of Mathematics 12, 89–
103.
Trench, M.S., Pederson, S.P., Lau, E.T., Ma, L., Wang, H., Nair, S.K., 2003. Managing
credit lines and prices for bank one credit cards. Interfaces 33, 4–21.
White, D.J., 1985. Real applications of Markov decision processes. Interfaces 15, 73–
78.
White, D.J., 1988. Further real applications of Markov decision processes. Interfaces
18, 55–61.
White, D.J., 1993. A survey of applications of Markov decision process. Journal of
Operational Research Society 44, 1073–1096.