Loan Default Prediction: EDA Case Study

advertisement
CASE STUDY
PROBLEM STATEMENT
Case study aims to identify patterns which indicate if a client has difficulty paying their
instalments which may be used for taking actions such as denying the loan, reducing
the amount of loan, lending (to risky applicants) at a higher interest rate, etc. This will
ensure that the consumers capable of repaying the loan are not rejected.
Identification of such applicants using EDA is the aim of this case study.
AVAILABLE SOURCES
➢ application_data.csv: information of the client at the time of
application.
➢ previous_application.csv: Data showing if previous application had
been Approved, Cancelled, Refused or Unused offer.
➢ columns_description.csv: Data dictionary which describes the
meaning of the variables
TARGET 0
DISTRIBUTION INCOME RANGE
/ GENDER
INFERENCE
➢ There no records below 25000 of
income.
➢ Most number of credits are for range
125000 - 150000
➢ There are less records for higher income
groups
TARGET 0
DISTRIBUTION CREDIT RANGE
/ GENDER
INFERENCE
➢ Credit Amount is higher for females in all
the ranges
TARGET 0
DISTRIBUTION ORGANIZATION TYPE
INFERENCE
➢ Business Entity Type 3 and Self employed
has more count then all other Org types.
➢ Industry type 13, Trade type 5 and
Industry type 8 has least counts.
TARGET 0
UNIVARIATE ANALYSIS FOR VARIABLES
INCOME AMOUNT
INFERENCE
➢ A large variation in the Total income is
observed.
➢ Many outliers present towards the
maxima with respect to the median
value.
➢ There are more number of records in till
25% percentile and outliers on +ve side.
TARGET 0
UNIVARIATE ANALYSIS FOR VARIABLES
CREDIT AMOUNT
INFERENCE
➢ Some outliers towards the maximum
Credit amount of loan are observed
➢ But the overall distribution is more
oriented in the interquartile (IQR) range.
TARGET 0
UNIVARIATE ANALYSIS FOR VARIABLES
ANNUITY
INFERENCE
➢ Several outliers towards the maxima are
observed in loan annuity
➢ But they are distributed in a small range,
not too far from the median value.
➢ More values are till 25% with outliers on
higher range
TARGET 0
BIVARIATE ANALYSIS
INCOME / ORGANIZATION TYPE
INFERENCE
➢ The outliers in total income is most widely
distributed in the case of Business entity
Type 3
➢ Appears plot is not very good for
analysis considering large number of
Organization Type
TARGET 1
DISTRIBUTION CREDIT RANGE
/ GENDER
INFERENCE
➢ No correlation is observed between
the number of grants and the
amount of credit, in the case of both
males and females.
TARGET 1
INCOME TYPE / GENDER
INFERENCE
➢ Highest no. of defaulters are in the
‘Working’ category.
➢ More female defaulters are present in all
4 active categories.
TARGET 1
DISTRIBUTION ORGANIZATION TYPE
INFERENCE
➢ Business Entity Type 3 and Self employed
categories have more no. of credits.
TARGET 1
UNIVARIATE ANALYSIS FOR VARIABLES
CREDIT AMOUNT
INFERENCE
➢ Some outliers are observed with respect
to the Credit amount of loan, but it is not
too far from the median value that is
close to 10^6
TARGET 1
UNIVARIATE ANALYSIS FOR VARIABLES
ANNUITY
INFERENCE
➢ Loan annuity has several outliers towards
the maxima, but they are not much
widely distributed, the farthest deviation
is observed around 10^5, and it is not
much deviated from the median value.
PREVIOUS
DATA
DISTRIBUTION TARGET(PAYMENT DIFFICULTY)
/ LOAN PURPOSE
INFERENCE
➢ As the no. of approved loans increases,
no. of defaulters also increases.
➢ No. of defaulters are always less than
the other customers in all categories.
➢ Least no. of defaulters are observed in
the category of loans approved for
‘money for a third person’.
TARGET 1
BIVARIATE ANALYSIS
CREDIT AMOUNT / EDUCATION TYPE
INFERENCE
➢ The outliers in total income of defaulters
with different family status did not seem
to vary much, except in the cases of
Business entity Type 3 and Government
organisations
TASKS COMPLETED
• DATA CLEANSING OF ‘APPLICATION_DATA.CSV’
• UNIVARIATE AND BIVARIATE ANALYSIS OF DATA IN ‘APPLICATION_DATA.CSV’
• DATA CLEANSING OF ‘PREVIOUS_APPLICATION.CSV’
• UNIVARIATE AND BIVARIATE ANALYSIS OF DATA IN ‘PREVIOUS_APPLICATION.CSV’
Download