Some short answer questions – treat questions 1 and 2 like a “regular” homework. You can just give the answers. For maximum learning I suggest each of you try this on your own then get together and discuss what you did and turn in your group’s responses.
(1) Suppose I fit a neural network with 2 hidden units and three inputs: X1=debt to income ratio,
X2=age, and X3 = years of job training or experience. Y is a binary variable, 1 for default on a car loan and 0 otherwise. My neural net used the standard (hyperbolic tangent) functions in going from the inputs to the hidden units. The hyperbolic tangent of an argument L, maybe L is a linear combination of some inputs, is (exp(2L)-1) / (exp(2L)+1) which is the same as (exp(L)-exp(-L)) / (exp(L)+exp(-L)) (though
I’m not asking you, you should be able to see why). Here are the bias and weights (for X1, X2, X3 in order) for linking the inputs to the two hidden units:
Unit 1: 50 1.2 -0.5 -0.8
Unit 2: 12 -0.2 2.0 -0.5
Here are the bias and weights for the two hidden units (Units 1 and 2 in order) that are used to link the hidden units to the logit of the response variable (just a linear combination of the hidden units):
-1 -0.6 1.1
What are the logit and probability (of default) for someone with a debt to income ratio X1=2, age X2=30 and having X3=5 years of job training? Assume a logistic function linking the hidden units to the response. Logit = ______ Probability of default = ________
(2) Suppose X=ln(2). What is the hyperbolic tangent of X? You should be able to do this easily by hand.
What range of values ___ to ____ can the hyperbolic function take on?
(3) This part involves a nice report. I have generated some data on features, age, debt to income ratio
(DI_ratio) and amount of training or job experience (training) that might predict the probability of people going into arrears (getting behind on payments) by a certain critical amount on their loan. When you run the program for this homework you will see how the data were generated. My goal is to see if I can predict the probability of going into arrears and thus avoid loaning to people who will go into arrears in the future by looking at their 3 features.
I. First, I’ll step you through a neural network model in Enterprise Miner so we’ll all be working from the same information.
(a) First run this program and view the 3-D graphs. Adjust the libname statement to put the data into your library:
LIBNAME aaem "c:\workshop\winsas\aaem" ; ** change to yours **;
Data arrears; do subject = 1 to 5000 ;
DI = round( 5 *ranuni( 123 ), .01
);
training = round( 8 *ranuni( 123 ), .25
);
age = round( 3 *training + 30 + 5 *normal( 123 ));
p = 0.18
;
radius= DI** 2 + ((age40 )/ 10 )** 2 ; if radius < 1 then p= 0.03
; if DI > 5.5
-((age38 )/ 10 )** 2 then p= .95
; if radius> p = p -
1 and DI>
.18
4 then
*( (training/ 4 1 )** 2 )*(training< if age < 20 and DI_ratio > 4 then p = .8
*p + .2
;
4 );
p=sqrt(p); target = ranuni( 123 )<p; DI_ratio=DI; keep subject DI_ratio training age target; output ; end ; proc g3d ; scatter DI_ratio*age=target/ noneedle ; proc g3d ; scatter training*DI_ratio=target/ noneedle ; data aaem.arrears; set arrears; proc means ; ods listing gpath = "%sysfunc(pathname(work))" ; proc sgplot ; scatter Y =DI_ratio X =age/ group =p; run ;
(b) In Enterprise Miner pull in the raw data after defining the target variable as a binary target and subject as an ID. Divide the data into half training and half validation.
(c) Connect the data partition node to a neural network node. Change the Model Selection
Criterion to Average Error as suggested in our Veterans demo but make no other changes. Run that node (check to see if it converged).
(d) For comparison purposes, connect a regression node and a decision tree node to the data partition. In the tree node, make it a class probability tree by making the subtree assessment measure
Average Square Error. Make no other changes. We have only 3 features (predictors) so we’ll not bother with model selection in the regression. Run these as well.
(e) Connect your three models (neural net, regression, tree) to a Model Comparison node.
(f) For now, we’ll leave the decision matrix as it stands though you may want to play around with different profits after you finish up the homework.
(g) Run the diagram from the Model Comparison node.
II. Here are the points expected to be addressed in your report:
(a) Compare the average age, debt to income ratio (DI_ratio) and amount of training (training) for those in arrears (target=1) to those not in arrears in the full data. Also mention the overall rate of going into arrears in the data. This can be done outside of Enterprise Miner.
(b) What is the most important feature (predictor variable) based on the tree model training data result? Is the answer the same for the validation data? In each part of the data partition, what are the relative importances of the other two features?
(c) By what number is my odds of being in arrears multiplied if my debt to income ratio increases by 1, according to the regression model? Explain why it would be hard (or impossible) to get an odds ratio with a neural network or a tree model.
(d) Find the lift number for each model that is listed in the Model Comparison results Fit
Statistics table and list the three for the validation data. The lift is a function with “depth” as its horizontal axis, it’s not just a number. By looking at the Model Comparison node’s properties panel or otherwise, explain at what depth these lift numbers were computed. Also explain to the reader how lift is computed.
(e) (related to (d)) My boss says we’re going to cut down on our rate of making loans by 5% next year. I could just refuse 5% of the applicants at random or I could look at their features (age, DI_ratio, training) and use my neural net model to select the 5% to refuse. In terms of the probability of an applicant going into arrears, how much better off will I be, if at all, using the model? Please state this carefully. Assume what happened in the validation data gives a reasonable answer to this question.
(f) Which model is chosen as the winner and what default criterion was used to select it? What is the area under the ROC curve for the validation data for that model? The book suggests a strong model has area exceeding 0.7. How did we do? What is the corresponding recommendation for a strong Gini coefficient?
(g) Do you think we were in pretty good shape without oversampling for rare events here?
Why?
(h) Using the output window in the Model Comparison results, which model minimizes the number of false positives in the validation data? If we make a false positive decision, are we deciding that a person is not in arrears when actually they are, or are we deciding a person is in arrears when they actually are not?
(i) Based on the tree result, would you suggest a possible simplification in your neural net?
Be sure to write up a nice report incorporating the points above.