CASE STUDY PROBLEM STATEMENT Case study aims to identify patterns which indicate if a client has difficulty paying their instalments which may be used for taking actions such as denying the loan, reducing the amount of loan, lending (to risky applicants) at a higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected. Identification of such applicants using EDA is the aim of this case study. AVAILABLE SOURCES ➢ application_data.csv: information of the client at the time of application. ➢ previous_application.csv: Data showing if previous application had been Approved, Cancelled, Refused or Unused offer. ➢ columns_description.csv: Data dictionary which describes the meaning of the variables TARGET 0 DISTRIBUTION INCOME RANGE / GENDER INFERENCE ➢ There no records below 25000 of income. ➢ Most number of credits are for range 125000 - 150000 ➢ There are less records for higher income groups TARGET 0 DISTRIBUTION CREDIT RANGE / GENDER INFERENCE ➢ Credit Amount is higher for females in all the ranges TARGET 0 DISTRIBUTION ORGANIZATION TYPE INFERENCE ➢ Business Entity Type 3 and Self employed has more count then all other Org types. ➢ Industry type 13, Trade type 5 and Industry type 8 has least counts. TARGET 0 UNIVARIATE ANALYSIS FOR VARIABLES INCOME AMOUNT INFERENCE ➢ A large variation in the Total income is observed. ➢ Many outliers present towards the maxima with respect to the median value. ➢ There are more number of records in till 25% percentile and outliers on +ve side. TARGET 0 UNIVARIATE ANALYSIS FOR VARIABLES CREDIT AMOUNT INFERENCE ➢ Some outliers towards the maximum Credit amount of loan are observed ➢ But the overall distribution is more oriented in the interquartile (IQR) range. TARGET 0 UNIVARIATE ANALYSIS FOR VARIABLES ANNUITY INFERENCE ➢ Several outliers towards the maxima are observed in loan annuity ➢ But they are distributed in a small range, not too far from the median value. ➢ More values are till 25% with outliers on higher range TARGET 0 BIVARIATE ANALYSIS INCOME / ORGANIZATION TYPE INFERENCE ➢ The outliers in total income is most widely distributed in the case of Business entity Type 3 ➢ Appears plot is not very good for analysis considering large number of Organization Type TARGET 1 DISTRIBUTION CREDIT RANGE / GENDER INFERENCE ➢ No correlation is observed between the number of grants and the amount of credit, in the case of both males and females. TARGET 1 INCOME TYPE / GENDER INFERENCE ➢ Highest no. of defaulters are in the ‘Working’ category. ➢ More female defaulters are present in all 4 active categories. TARGET 1 DISTRIBUTION ORGANIZATION TYPE INFERENCE ➢ Business Entity Type 3 and Self employed categories have more no. of credits. TARGET 1 UNIVARIATE ANALYSIS FOR VARIABLES CREDIT AMOUNT INFERENCE ➢ Some outliers are observed with respect to the Credit amount of loan, but it is not too far from the median value that is close to 10^6 TARGET 1 UNIVARIATE ANALYSIS FOR VARIABLES ANNUITY INFERENCE ➢ Loan annuity has several outliers towards the maxima, but they are not much widely distributed, the farthest deviation is observed around 10^5, and it is not much deviated from the median value. PREVIOUS DATA DISTRIBUTION TARGET(PAYMENT DIFFICULTY) / LOAN PURPOSE INFERENCE ➢ As the no. of approved loans increases, no. of defaulters also increases. ➢ No. of defaulters are always less than the other customers in all categories. ➢ Least no. of defaulters are observed in the category of loans approved for ‘money for a third person’. TARGET 1 BIVARIATE ANALYSIS CREDIT AMOUNT / EDUCATION TYPE INFERENCE ➢ The outliers in total income of defaulters with different family status did not seem to vary much, except in the cases of Business entity Type 3 and Government organisations TASKS COMPLETED • DATA CLEANSING OF ‘APPLICATION_DATA.CSV’ • UNIVARIATE AND BIVARIATE ANALYSIS OF DATA IN ‘APPLICATION_DATA.CSV’ • DATA CLEANSING OF ‘PREVIOUS_APPLICATION.CSV’ • UNIVARIATE AND BIVARIATE ANALYSIS OF DATA IN ‘PREVIOUS_APPLICATION.CSV’