Test 3 Data Mining 2016 (Dickey) Name ____________________________ For trees, neural networks, and logistic regression, assume as usual a binary target Y with Y=1 representing an event and Y=0 a non-event. As always, e represents the base of the natural logarithms. 1. (20 pts.) A neural network for a binary response involves hyperbolic tangent functions and, at the end, a logistic function to convert a logit to a probability. If eX=2, what are the hyperbolic tangent __________ and logistic _________ functions of X? A neural network with one input X and 2 hidden units H1 and H2 in a hidden layer has biases 0.2 and 0.3 and associated weights 0.5 and 0.6 on X for the two hidden units respectively. To evaluate H1 and H2 for X=2, I would compute the hyperbolic tangents of _______ and ________. 2. (15 pts.) I have a logistic regression with just one predictor variable which is a class variable with 4 levels. I used the default deviation coding to get these estimates for the logit: 4 for the intercept and -1, 2, and 4 for the first three class levels. What is the estimate ________ for the fourth class level? What are the corresponding intercept _________ and first level estimate ______ I will get if I re-run the analysis using GLM coding? 3. (10 pts.) I have a logistic regression for binary response Y in which the logit is L = 0.5 +0 .1X1 +0 .2X2. What increase _____________ in X1 would multiply the odds of an event by e (as opposed to doubling)? 4. (15 pts.) I have a binary target Y. If I decide Y=0, I do not gain or lose anything. If I decide Y=1, I gain $16 if I am right and lose $4 if I am wrong. To maximize expected profit, I should decide Y=1 whenever the probability that Y=1 exceeds ________ 5. (15 pts.) I know that Pr(A) = .6 and Pr(B) = .5 and I have the rule A=>B . What are the maximum __________ and minimum __________ possible confidence for the rule A=>B under these conditions? What value of Pr{A and B} __________ , if any, would make the lift for the rule A=>B equal to 1? 6. (25 pts.) The best leaf from my decision tree for predicting events (1s) has 45 1’s and 4 0’s. In my root node, there were 150 1’s and 50 0’s. Suppose I decide to say that all observations falling in my best leaf will be events and none of the others will be events. (A) How many concordant pairs ___________ does that decision produce? (B) How many tied pairs _____________ are represented in that best leaf? (C) The count of all concordant pairs plus half the overall number of ties is divided by a number to get the area under the ROC curve (C statistic). What is that number _____________ in this example? (D) Find the sensitivity _____________ and specificity ____________ associated with the given decision. *************ANSWERS*********************** (1) Logistic(x) = exp(x)/(1+exp(x))=2/(2+1)=2/3. Note exp(-x) is 1/exp(x) =1/2 so Htan(X) = (2-1/2)/(2+1/2) = (3/2)/(5/2)=0.6. The arguments to the hyperbolic tangent functions are 0.2+0.5(2)=1.2 and 0.3 + 0.6(2) = 1.5. The logistic function, as it says, is used to convert a logit to a probability so another way to ask this same exact question is: If X is a logit and exp(x)=2, then what is the associated probability. Note that there is no reason to compute X but if you want to, X=log(2) so the logistic function is exp(log(2))/(1+exp(log(2)) = 2/3. Notice that a logistic function is not the same as a logit. The logit is the argument to the logistic function in logistic regression. (2) Fourth level estimate is –(-1+2+4)=-5 so predictions (group means) are 4-1=3, 4+2=6, 4+4=8 and 4-5 = -1. For GLM coding the last prediction is intercept+0 so intercept=-1 and I still have to get the same predictions so to reproduce 3 as the first mean (prediction for the first level), we would have to add 4 to -1. (3) Log(odds*e)=Log(odds)+1. To raise L by 1 we need 0.1(X1+D)-0.1X1=0.1D. If we add D=10 to X1 then the new log of the odds becomes the old log of odds + 1 and the new odds becomes the old odds times e. Relating to the doubling amount, to double the odds we would have taken log(2)/b1 as in the notes. To multiply the odds by e rather than by 2 we would take log(e)/b1 = 1/b1. (4) E{profit}=16p-4(1-p) so p>0.2 makes E{profit}>0. Thus p>4/20. (5) The Venn diagram total area for Pr(A) and Pr(B) cannot exceed 1. There must be some overlap, that is, Pr{A and B} must exceed 0. The smallest overlap (i.e. the most spread) of the A and B areas is Pr{A and B}=0.1 since 0.6+0.5-0.1=1. The largest is when the areas are overlaid so Pr{A and B} = Pr{B}=.5. We need these because Pr{A and B}/Pr{A} is confidence so confidence is between 1/6 and 5/6. Lift=1 means Pr{A and B}/(Pr{A}Pr{B})=1, so Pr{A and B}=0.3 (6) For this decision: To the left: 45 1s, 4 0s To the right 105 1s, 46 0s Concordant pairs for this decision boundary: 45*46. Tied in this best leaf: 45*4. Total pairs overall 150*50 is our denominator when computing C. Sensitivity=Pr{call a 1 a 1}=45/150 = 0.30 Specificity = Pr{call a 0 a 0} = 46/50=0.92 (so 1-specificity=0.08 and X=0.08 , Y=0.30 is a point on the ROC curve).