Interest Rate Prediction Model

advertisement
Interest Rate Prediction Model Analyst: Ngan Huynh Introduction ................................................................................................................................................. 3 Methods ......................................................................................................................................................... 3 Data collection: ...................................................................................................................................... 3 Exploratory Analysis ........................................................................................................................... 3 Statistical modeling .................................................................................................................................. 3 Reproducibility ........................................................................................................................................... 3 Results ............................................................................................................................................................ 3 Data Cleansing ....................................................................................................................................... 4 Statistical Results ...................................................................................................................................... 4 Conclusion .................................................................................................................................................... 5 References .................................................................................................................................................... 5 Introduction The interest rates of 2,500 loans are determined by the Lending Club based on the
characteristics of the applicants who ask for the loan including their employment history,
credit history and the creditworthiness scores. Interest rates are primary roles in money
lending which is percentage of the commission amount that the borrowers have to pay
back, including with the original borrowing amount.
The range of FICO number is associated directly with the interest rate of loans. Here we
do the analysis to see the relationship between the interest rate and the other variables. In
case the same range of Fico, we will determine which is the factor correlated with interest
rate. In this analysis, we apply the exploratory analysis and standard multiple regression
techniques. Our results suggest that the low number Fico will have the high interest rate.
Methods Data collection:
For our analysis, we adapt the data with a sample of 2,500 peer-to-peer loans through the
Lending Club downloaded from (Loan Data) for us to do analysis.
Exploratory Analysis
We explore the data by examining tables and plots of the given data. At first, we try to
clean up the raw data to make it easier for our data observation. This cleaning process is
to identify missing values and replace variables to easily perform. Then we do some test
for variables associated with interest rate to quantify associations between the interest
rate and the other variable in data set. Finally, we determine the associating factors in the
regressing model relating the interest rate to the Fico range.
Statistical modeling
The regression model is the standard multivariate linear regression relating the interest
rate to the Fico range and other variables, which is performed based on our exploratory
analysis data. The data is then fitted into R.
Reproducibility
All analyses performed in this manuscript are represented in the LoanFinal.pdf file.
Results
The loan data observed in this analysis includes the information of the amount (in dollars)
of requested loan (AR), funded loan (AF) and monthly income of the application, the
percentage value of the interest rate (IR), the range of Fico credits (Fico), the number of
credit lines (CL), the monthly length of loan money (Len), purpose of loan (Purp), the
ratio of income that used to pay debt (DR), the US state of residence of the loan applicant
(State).
Data Cleansing
We identify the missing value and clean the data: · To make the graph more readable, we
replace the value of Loan.purpose so they become shorter. Here is how we replace them:
• Biz for small_business
• Credit for credit_card
• Debt for debt_consolidation
• Home for home_improvement
• Purch for major_purchase
• Edu for educational
• Wed for wedding
Move for moving Medic for medical Vac for vacation
There are seven missing data (NA values) in 2 loan applications so we replace them by 0.
Because there are so few of them, it should not affect our statistical result.
Delete units in values of loan length, in particular, 36 months to become 36, or 15% to
become 15 in values of interest rate and debt to income ratio. his would make R perform
analytical operations easier.
For FICO range: delete the mark “-“ in values. For example, that would make 650-654
will become 650654. As we consider it as a number, R could analyze the data easier and
it would not affect our statistical result.
Statistical Results
The interest rate is approximately from 5.42% to 25%, in which 75% of loan applications
with interest rate more than 10%. The distribution of Fico range has the highest number
of applications at the Fico point of 660 and has the smallest number of application at the
Fico point of 834.
The purposes of loans are mainly for debt_consolidation with the highest mean of interest
rate at around 14%, credit card, other and small_business have the same in mean. The
loan applications in 60 months almost have higher interest rate of loans than ones in 36
months, the average value are approximately 17% and 13%. There is no significant
difference in the values of mean interest rate respectively with employment length.
Overall, state ST has the highest range of interest rage from 14% to 23%. There are 4
states AK, HI, LA, NM with ranges of interest rate from 11% to 20%. The interest rate
are approximately from 7% to 16%, with the majority in amount requested and monthly
income lower 20,000$. There are some outliers in high monthly income with low interest
rate.
We perform a regression model relating interest rate to Fico range.
IR = b0+b1(Fico)+f(AR)+g(Len)+h(CL)+e
where b0 is an intercept term and b1 represents the change in interest rate associated with
the change value of Fico credit at the same amount of loan money, the length of loan and
the number of credit lines which applicants have. We consider the term Fico, AR, CL at
five different levels for range of Fico, the number of loan money and credit lines of
applicants. The term e is simulated as all unmeasured variation in interest rate.
The association between interest rate and range of Fico is highly statistically significant
at P=2.2e-16. The value of b0 is 7.4e+ 01and the values of b1 -8.818e-05. At the same
value of amount requested, length of loan and same number of credit lines, the number of
interest rate is higher for the lower Fico number.
Here, we consider the change of interest rates while the loan applications have the same
Fico numbers. We consider the situation of Fico number 660664 which has the highest
frequency. The interest rate is correlated with the length of loan application and the
amount of requested money. The loan application which has longer time of paying debt
will have higher interest rate.
Conclusion
From the analysis results, we suggest that there is a significant, negative association
between the interest rate and the Fico range by using a linear regressing model. The
model has included other variables such as the amount of loan, the length of paying loan
and the number of credit lines which the applicants have. These adjustment in the model
would improve the model fit, but the highly significant relationship between the
variables. Furthermore, in case the same of Fico range we suggest that the loan length
and the amount of requested money are correlated with interest rate.
This data of 2,500 applications of loans is extracted from larger collection loans through
the Lending Club. Our analysis is useful for further investigation to adjust the amount of
interest rate to be correlated with other factors.
References
Wikipedia “Debt-to-income ratio” Page. URL: http://en.wikipedia.org/wiki/Debt-toincome_ratio, accessed 2/14/2013.
Institute For Digital Research. URL: http://www.ats.ucla.edu/stat/r/faq/subset_R.htm,
accessed 2/16/2013.
50 states in America. URL: http://www.50states.com/abbreviations.htm#.UQ8hxFp2PKo,
accessed 2/14/2013.
Wikipedia, Credit Score in the United States. URL:
http://en.wikipedia.org/wiki/Credit_score_in_the_United_States, accessed 2/14/2013.
Lending club, home page. URL: https://www.lendingclub.com/home.action, accessed 2/14/2013 
Download