Interest Rate Prediction Model Analyst: Ngan Huynh Introduction ................................................................................................................................................. 3 Methods ......................................................................................................................................................... 3 Data collection: ...................................................................................................................................... 3 Exploratory Analysis ........................................................................................................................... 3 Statistical modeling .................................................................................................................................. 3 Reproducibility ........................................................................................................................................... 3 Results ............................................................................................................................................................ 3 Data Cleansing ....................................................................................................................................... 4 Statistical Results ...................................................................................................................................... 4 Conclusion .................................................................................................................................................... 5 References .................................................................................................................................................... 5 Introduction The interest rates of 2,500 loans are determined by the Lending Club based on the characteristics of the applicants who ask for the loan including their employment history, credit history and the creditworthiness scores. Interest rates are primary roles in money lending which is percentage of the commission amount that the borrowers have to pay back, including with the original borrowing amount. The range of FICO number is associated directly with the interest rate of loans. Here we do the analysis to see the relationship between the interest rate and the other variables. In case the same range of Fico, we will determine which is the factor correlated with interest rate. In this analysis, we apply the exploratory analysis and standard multiple regression techniques. Our results suggest that the low number Fico will have the high interest rate. Methods Data collection: For our analysis, we adapt the data with a sample of 2,500 peer-to-peer loans through the Lending Club downloaded from (Loan Data) for us to do analysis. Exploratory Analysis We explore the data by examining tables and plots of the given data. At first, we try to clean up the raw data to make it easier for our data observation. This cleaning process is to identify missing values and replace variables to easily perform. Then we do some test for variables associated with interest rate to quantify associations between the interest rate and the other variable in data set. Finally, we determine the associating factors in the regressing model relating the interest rate to the Fico range. Statistical modeling The regression model is the standard multivariate linear regression relating the interest rate to the Fico range and other variables, which is performed based on our exploratory analysis data. The data is then fitted into R. Reproducibility All analyses performed in this manuscript are represented in the LoanFinal.pdf file. Results The loan data observed in this analysis includes the information of the amount (in dollars) of requested loan (AR), funded loan (AF) and monthly income of the application, the percentage value of the interest rate (IR), the range of Fico credits (Fico), the number of credit lines (CL), the monthly length of loan money (Len), purpose of loan (Purp), the ratio of income that used to pay debt (DR), the US state of residence of the loan applicant (State). Data Cleansing We identify the missing value and clean the data: · To make the graph more readable, we replace the value of Loan.purpose so they become shorter. Here is how we replace them: • Biz for small_business • Credit for credit_card • Debt for debt_consolidation • Home for home_improvement • Purch for major_purchase • Edu for educational • Wed for wedding Move for moving Medic for medical Vac for vacation There are seven missing data (NA values) in 2 loan applications so we replace them by 0. Because there are so few of them, it should not affect our statistical result. Delete units in values of loan length, in particular, 36 months to become 36, or 15% to become 15 in values of interest rate and debt to income ratio. his would make R perform analytical operations easier. For FICO range: delete the mark “-“ in values. For example, that would make 650-654 will become 650654. As we consider it as a number, R could analyze the data easier and it would not affect our statistical result. Statistical Results The interest rate is approximately from 5.42% to 25%, in which 75% of loan applications with interest rate more than 10%. The distribution of Fico range has the highest number of applications at the Fico point of 660 and has the smallest number of application at the Fico point of 834. The purposes of loans are mainly for debt_consolidation with the highest mean of interest rate at around 14%, credit card, other and small_business have the same in mean. The loan applications in 60 months almost have higher interest rate of loans than ones in 36 months, the average value are approximately 17% and 13%. There is no significant difference in the values of mean interest rate respectively with employment length. Overall, state ST has the highest range of interest rage from 14% to 23%. There are 4 states AK, HI, LA, NM with ranges of interest rate from 11% to 20%. The interest rate are approximately from 7% to 16%, with the majority in amount requested and monthly income lower 20,000$. There are some outliers in high monthly income with low interest rate. We perform a regression model relating interest rate to Fico range. IR = b0+b1(Fico)+f(AR)+g(Len)+h(CL)+e where b0 is an intercept term and b1 represents the change in interest rate associated with the change value of Fico credit at the same amount of loan money, the length of loan and the number of credit lines which applicants have. We consider the term Fico, AR, CL at five different levels for range of Fico, the number of loan money and credit lines of applicants. The term e is simulated as all unmeasured variation in interest rate. The association between interest rate and range of Fico is highly statistically significant at P=2.2e-16. The value of b0 is 7.4e+ 01and the values of b1 -8.818e-05. At the same value of amount requested, length of loan and same number of credit lines, the number of interest rate is higher for the lower Fico number. Here, we consider the change of interest rates while the loan applications have the same Fico numbers. We consider the situation of Fico number 660664 which has the highest frequency. The interest rate is correlated with the length of loan application and the amount of requested money. The loan application which has longer time of paying debt will have higher interest rate. Conclusion From the analysis results, we suggest that there is a significant, negative association between the interest rate and the Fico range by using a linear regressing model. The model has included other variables such as the amount of loan, the length of paying loan and the number of credit lines which the applicants have. These adjustment in the model would improve the model fit, but the highly significant relationship between the variables. Furthermore, in case the same of Fico range we suggest that the loan length and the amount of requested money are correlated with interest rate. This data of 2,500 applications of loans is extracted from larger collection loans through the Lending Club. Our analysis is useful for further investigation to adjust the amount of interest rate to be correlated with other factors. References Wikipedia “Debt-to-income ratio” Page. URL: http://en.wikipedia.org/wiki/Debt-toincome_ratio, accessed 2/14/2013. Institute For Digital Research. URL: http://www.ats.ucla.edu/stat/r/faq/subset_R.htm, accessed 2/16/2013. 50 states in America. URL: http://www.50states.com/abbreviations.htm#.UQ8hxFp2PKo, accessed 2/14/2013. Wikipedia, Credit Score in the United States. URL: http://en.wikipedia.org/wiki/Credit_score_in_the_United_States, accessed 2/14/2013. Lending club, home page. URL: https://www.lendingclub.com/home.action, accessed 2/14/2013