test3_2016.docx

advertisement
Test 3 Data Mining 2016 (Dickey)
Name ____________________________
For trees, neural networks, and logistic regression, assume as usual a binary target Y with Y=1
representing an event and Y=0 a non-event. As always, e represents the base of the natural logarithms.
1. (20 pts.) A neural network for a binary response involves hyperbolic tangent functions and, at the end,
a logistic function to convert a logit to a probability. If eX=2, what are the hyperbolic tangent
__________ and logistic _________ functions of X?
A neural network with one input X and 2 hidden units H1 and H2 in a hidden layer has biases 0.2 and 0.3
and associated weights 0.5 and 0.6 on X for the two hidden units respectively. To evaluate H1 and H2
for X=2, I would compute the hyperbolic tangents of _______ and ________.
2. (15 pts.) I have a logistic regression with just one predictor variable which is a class variable with 4
levels. I used the default deviation coding to get these estimates for the logit: 4 for the intercept and
-1, 2, and 4 for the first three class levels. What is the estimate ________ for the fourth class level?
What are the corresponding intercept _________ and first level estimate ______ I will get if I re-run the
analysis using GLM coding?
3. (10 pts.) I have a logistic regression for binary response Y in which the logit is L = 0.5 +0 .1X1 +0 .2X2.
What increase _____________ in X1 would multiply the odds of an event by e (as opposed to doubling)?
4. (15 pts.) I have a binary target Y. If I decide Y=0, I do not gain or lose anything.
If I decide Y=1, I gain $16 if I am right and lose $4 if I am wrong.
To maximize expected profit, I should decide Y=1 whenever the probability that Y=1 exceeds ________
5. (15 pts.) I know that Pr(A) = .6 and Pr(B) = .5 and I have the rule A=>B .
What are the maximum __________ and minimum __________ possible confidence for the rule A=>B
under these conditions?
What value of Pr{A and B} __________ , if any, would make the lift for the rule A=>B equal to 1?
6. (25 pts.) The best leaf from my decision tree for predicting events (1s) has 45 1’s and 4 0’s. In my
root node, there were 150 1’s and 50 0’s. Suppose I decide to say that all observations falling in my best
leaf will be events and none of the others will be events.
(A) How many concordant pairs ___________ does that decision produce?
(B) How many tied pairs _____________ are represented in that best leaf?
(C) The count of all concordant pairs plus half the overall number of ties is divided by a number
to get the area under the ROC curve (C statistic). What is that number _____________ in
this example?
(D) Find the sensitivity _____________ and specificity ____________ associated with the given
decision.
*************ANSWERS***********************
(1) Logistic(x) = exp(x)/(1+exp(x))=2/(2+1)=2/3.
Note exp(-x) is 1/exp(x) =1/2 so Htan(X) = (2-1/2)/(2+1/2) = (3/2)/(5/2)=0.6. The arguments to
the hyperbolic tangent functions are 0.2+0.5(2)=1.2 and 0.3 + 0.6(2) = 1.5.
The logistic function, as it says, is used to convert a logit to a probability so another way to ask
this same exact question is: If X is a logit and exp(x)=2, then what is the associated probability.
Note that there is no reason to compute X but if you want to, X=log(2) so the logistic function is
exp(log(2))/(1+exp(log(2)) = 2/3. Notice that a logistic function is not the same as a logit. The
logit is the argument to the logistic function in logistic regression.
(2) Fourth level estimate is –(-1+2+4)=-5 so predictions (group means) are 4-1=3, 4+2=6, 4+4=8 and
4-5 = -1. For GLM coding the last prediction is intercept+0 so intercept=-1 and I still have to get
the same predictions so to reproduce 3 as the first mean (prediction for the first level), we
would have to add 4 to -1.
(3) Log(odds*e)=Log(odds)+1. To raise L by 1 we need 0.1(X1+D)-0.1X1=0.1D. If we add D=10 to X1
then the new log of the odds becomes the old log of odds + 1 and the new odds becomes the
old odds times e. Relating to the doubling amount, to double the odds we would have taken
log(2)/b1 as in the notes. To multiply the odds by e rather than by 2 we would take log(e)/b1 =
1/b1.
(4) E{profit}=16p-4(1-p) so p>0.2 makes E{profit}>0. Thus p>4/20.
(5) The Venn diagram total area for Pr(A) and Pr(B) cannot exceed 1. There must be some overlap,
that is, Pr{A and B} must exceed 0. The smallest overlap (i.e. the most spread) of the A and B
areas is Pr{A and B}=0.1 since 0.6+0.5-0.1=1. The largest is when the areas are overlaid so Pr{A
and B} = Pr{B}=.5. We need these because Pr{A and B}/Pr{A} is confidence so confidence is
between 1/6 and 5/6. Lift=1 means Pr{A and B}/(Pr{A}Pr{B})=1, so Pr{A and B}=0.3
(6) For this decision:
To the left: 45 1s, 4 0s
To the right 105 1s, 46 0s
Concordant pairs for this decision boundary: 45*46. Tied in this best leaf: 45*4.
Total pairs overall 150*50 is our denominator when computing C.
Sensitivity=Pr{call a 1 a 1}=45/150 = 0.30 Specificity = Pr{call a 0 a 0} = 46/50=0.92
(so 1-specificity=0.08 and X=0.08 , Y=0.30 is a point on the ROC curve).
Download