Uploaded by josephabugah2

final update 5

advertisement
Prescriptive Analysis
Additional derived variables introduced
after the past two weeks of analysis
Compare best two models using
ROC plot
Final choice of deployment
The final choice of model for deployment I chose was the linear regression
model and the optimized KPI That we decided to focus on were 0.3 , 0.5
(moderator) and 0.6 regarding the confusion matrix of each
Summary of the lr model loan
screening
glm(formula = default ~ credit.policy + int.rate + fico +
log.annual.inc +
delinq.2yrs + revol.util + purpose + inq.last.6mths, family =
"binomial",
data = train)
purposesmall_business
0.251542 0.244660 1.028
0.30389
inq.last.6mths
0.083533 0.044026 1.897 0.05778 .
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
Null deviance: 2958.2 on 2133 degrees of freedom
(Intercept)
0.294343 307.410265 0.001 0.99924
Residual deviance: 1987.1 on 2120 degrees of freedom
credit.policy
-18.388313 307.402963 -0.060 0.95230
AIC: 2015.1
int.rate
58.023942 3.906555 14.853 < 2e-16 ***
fico
0.021969 0.002511 8.750 < 2e-16 ***
Number of Fisher Scoring iterations: 17
log.annual.inc
-0.461259 0.100242 -4.601 4.2e-06 ***
delinq.2yrs
-0.265352 0.122426 -2.167 0.03020 *
Graphical representantion of defaulting
probability by the lr model
6
Recap the different models and describe any
new variables or methods tried and lessons
learnt
The logistic regression (LR) model stands out for its effectiveness, supported by statistically
significant coefficients, particularly in key features like interest rate (int.rate), FICO score, and
revolving utilization (revol.util) with substantial impact, indicated by significant coefficients
and low p-values. The model's robust fit to the data is underscored by deviance metrics,
notably the substantially lower residual deviance compared to the null deviance. In contrast,
the random forest (RF) model's assessment relies on MeanDecreaseGini values for variable
importance. For a more direct comparison, considering performance metrics such as
accuracy, precision, recall, and F1-score, along with the examination of confusion matrices
on the test set, would provide insights into the models' performances in terms of true
positives, true negatives, false positives, and false negatives. The decision to prefer the LR
model over the RF model is rationalized by specific numerical comparisons, emphasizing the
LR model's strengths in interpretability and computational efficiency.
Visuals or descriptions to best compare your two
most effective models (the tree model has been
included and it has a discrete curve)
Optimized Threshold
9
Optimized Threshold
10
Describing additional insights
11
Describing additional insights
12
Download