ICS 278: Data Mining Lecture 18: Credit Scoring Padhraic Smyth Department of Information and Computer Science University of California, Irvine Data Mining Lectures Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine Presentations for Next Week • Names for each day will be emailed out by tomorrow • Instructions: – Email me your presentations by 12 noon the day of your presentation (no later please) – I will load them on my laptop (so no need to bring a machine) – Each presentation will be 6 minutes long + 2 minutes questions • So probably about 4 to 8 (max) slides per presentation Data Mining Lectures Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine References on Credit Scoring Statistical Classification Methods in Consumer Credit Scoring: a Review D. J. Hand and W. E. Henley Journal of the Royal Statistical Society: Series A Volume 160: Issue 3, November 1997 Available online at class Web page under lecture notes Also: Credit Scoring and its Applications: L. C. Thomas, D. B. Edelman, J. N. Crook, SIAM, 2002 Credit Risk Modeling, E. Mays (editor), American Management Association, 1998. Data Mining Lectures Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine Outline • Credit Scoring – Problem definition, standard notation • Data Sources • Models – Logistic regression, trees, linear regression, etc • Model building issues – Problem of reject inference • Practical issues – Cutoff selection, updating models Data Mining Lectures Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine The Problem of Credit Scoring • Applicants apply for a bank loan – Population 1 is rejected – Population 2 is accepted • Population 2a repays their loan -> labeled “good” • Population 2b goes into some form of default -> labeled “bad” • Model building – Build a model that can discriminate population 2a from population 2b – Usually treated as a classification problem – Typically want to estimate p(good | features) and rank individuals this way • Widely used by banks and credit card companies – Similar problems occur in direct marketing and other “scoring” applications Data Mining Lectures Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine Many different applications for Customer Scoring • Other financial applications: – Delinquent loans: who is most likely to pay up • Uses historical data on who paid in the past • Often used to create “portfolios” of delinquent debt – Customer revenue • How much will each customer generate in revenue over the next K years • Predicting marketing response – Cost of a mailer to a customer is order of $1 dollar – Targeted marketing • Rank customers in terms of “likelihood to respond” • “Churn” prediction • Many more…. – Predicting which customers are most likely to switch to another brand – E.g., wireless phone service – Scores used to rank customers and then target most likely with incentives Data Mining Lectures Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine Some background • History – General ideas started in the 1950’s • e.g., Bill Fair and Eric Isaac -> FairIsaac -> FICO scores – Initially a bit contraversial • Worries about it being unfair to some segments of society – US Equal Opportunity Credit Acts, 1975/76 • Skepticism that “machine generated rules” from data could outperform human generated guidelines – First adopted in credit-card approvals (1960’s) – Later broadly adopted in home-loans, etc – Now widely accepted and used by almost all banks, credit-granting agencies, etc Data Mining Lectures Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine Data Sources • Data from the loan application • Internal Performance data • External Performance data: – Age, address, income, profession, SS#, number of credit cards, savings, etc – Easy to obtain – How the individual has performed on other loans with the same bank – May only be available for a subset of customers – Credit Reports • How the individual has performed historically on all loans and credit cards • Relatively expensive to obtain (e.g., $1 per individual) – Court Judgements – Real Estate records • Macro-level external data – Demographic characteristics for applicant’s zip code or census tract Data Mining Lectures Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine Loan Application Data • Issues – Data entry errors (e.g., birthday = date of loan application) – Deliberate falsifications (e.g., over-reporting of income) – Legal issues • US Equal Credit Opportunity Acts, 1975/76 • Illegal to use race, color, religion, national origin, sex, marital status, or age in the decision to grant credit • But what if other variables are highly predictive of some of these variables? Data Mining Lectures Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine Variable Name Description Codings dob Year of birth If unknown the year will be 99 nkid Number of children number dep Number of other dependents number phon Is there a home phone 1=yes, 0 = no sinc Spouse's income aes Applicant's employment status V = Government W = housewife M = military P = private sector B = public sector R = retired E = self employed T = student U = unemployed N = others Z = no response Data Mining Lectures Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine Variable Name Description dainc Applicant's income res Residential status Codings O = Owner F = tenant furnished U = Tenant Unfurnished P = With parents N = Other Z = No response dhval Value of Home 0 = no response or not owner 000001 = zero value blank = no response dmort Mortgage balance outstanding 0 = no response or not owner 000001 = zero balance blank = no response doutm Outgoings on mortgage or rent doutl Outgoings on Loans douthp Outgoings on Hire Purchase doutcc Outgoings on credit cards Bad Good/bad indicator 1 = Bad 0 = Good Data Mining Lectures Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine Data Mining Lectures Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine Credit Report Data • Available from 3 major bureaus in the US: – Experian, Trans-Union, and Equifax • Data in the form of a list of transactions/events – Typically needs to be converted into feature-value form • E.g., “number of credit cards opened in past 12 months” – Can result in a huge number of features • Cost varies as a function of type and time-window of data requested – Interesting problem: “cost-optimal” downloading of selected credit report features adapted to each individual as a function of cheaper features Data Mining Lectures Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine Defining Good and Bad • Good versus Bad – Not necessarily clear how to define 2 classes – E.g., • bad = ever 3 or more payments in arrears? • Bad = 2 or more payments in arrears more than once? – A “spectrum” of behavior • Never any problems in payments • Occasional problems • Persistent problems – Typical to discard the intermediate cases and also those with insufficient experience to reliably classify them • Not ideal theoretically, but convenient Data Mining Lectures Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine Selecting a Data Set for Model Building • Sample selection – Typical sample sizes ~ 10k to 100k per class – Should be representative of customers who will apply in the future – Need to be able to get the relevant variables for this set of customers • Internal performance data • External performance data • Etc • External data sources (e.g., credit reports) can result in a very large number of possible variables – E.g., in the 1000’s – E.g., “number of accounts opened in past 12/24/36/… months” – Typically some form of variable selection done before building a model • Often based on univariate criteria such as information gain Data Mining Lectures Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine Models used in Credit Scoring • Regression: – Ignore the fact that we are estimating a probability – Typically linear regression is used • Classification (more common approach) – – – – – – • Logistic regression (most widely used) Decision trees (becoming more popular) Neural networks (experimented with, but not used in practice so much) Nearest neighbors Model combining - some work in this area SVMs - too new, relatively unproven General comments – Many trade-secrets, companies like FairIsaac do not publish details – Generally the industry is conservative: prefer well-established methods – Classification accuracy is only one part of the overall solution…. Data Mining Lectures Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine Logistic Regression Models log(odds) logit(p ) p log g-1( p ) = w0 + w1x1 +…+ wpxp 1-p ( ) logit(p) 1.0 p 0.5 0.0 0 Training Data Data Mining Lectures w0 + w1x1 Note that near 0, logit(p) is almost linear, so linear and logistic regression will be similar in this region Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine Modeling Example (from Hand and Henley paper) Data Mining Lectures Model Bad Risk Rate (%) k nearest neighbor with special metric 43.09 k nearest neighbor (standard) 43.25 logistic regression 43.30 linear regression 43.36 decision tree 43.77 Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine Evaluation Methods • Decile/Centile reporting: • Receiver Operation Characteristics • Bad Risk rate = bad risk among those accepted – Rank customers by predicted scores – Report “lift” rate in each decile (and cumulatively) compared to accepting everyone – Vary classification threshold – Plot proportion of good risks accepted vs. bad risks accepted – Let p = proportion of good risks – Let a = proportion accepted e.g., can show that, with a > p, the bad risk rate among those accepted is lower bounded by 1 – p/a e.g., p = 0.45, a =0.70 => bad risk rate must be between 0.35 and 0.78 Data Mining Lectures Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine Economics of Credit Scoring • Classification accuracy is not the appropriate metric • Benefit = Increase in revenue from using model - cost of developing and installing model • Model development: anywhere from $5k to $100k depending on the complexity of modeling project • Model installation: can be expensive (software, testing, legal requirements) • Revenue increase based on estimate performance plus assumptions about cost of bad risks versus good risks • Small improvements in accuracy (e.g., 1 to 5%) could lead to significant gains if the model is used on large numbers of customers – model maintenance and updating should also probably be included Data Mining Lectures Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine Problem of Reject Inference • Typically the population available for training consists only of past applicants who were accepted – Application data is available for “rejects”, but no performance data • Question: – Is there a way to use the data from rejected applicants? • Answer: no widely accepted approach. Methods include – Define all rejects as “bad” (not reliable!) – Build a statistical model (treat labels as missing, but biased) • Cam be quite complex, see Section 5 in Hand and Henley paper – Grant credit to some fraction of rejects and track their performance so that the “full population” is sampled • Rarely used for loans, but ideally is the best method Data Mining Lectures Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine Other issues • Threshold selection – Above what threshold should loans be granted – Depends on goals of the project • E.g., focusing on a small set of high-scoring customers versus “widening the net” to include a larger number (but still minimizing risk) • Time-dependent classification – What really matters is what the customer will do at time t+T – Can we model the “state” of a customer (rather than statically)? • Still somewhat of a research topic • Overrides – Loans are still manually “signed-off”. The bank may sometimes override the system’s recommendation Data Mining Lectures Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine The model works… now what? • Implementation – Depends on whether the model is replacing an existing automated model – … or is the first time modeling is being applied to the problem – Many software issues in terms of databases, security, etc • Monitoring and tracking – Important to see how the scorecard works in practice – Generating monthly/quarterly reports on scorecard performance • (naturally there will be some delay in this) – Analyzing in detail at performance on segments, by attribute, etc • Time for a new model? – E.g., population has changed significantly – E.g., new (cheap and useful) data available – E.g., new modeling technology available Data Mining Lectures Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine