Z_Final _2016.docx

advertisement
Final 2016
Unless otherwise instructed, assume no priors or costs are used.
1. (18 pts.) In one leaf of a decision tree for a binary target Y (Y=1 or 0), the predicted probability that
Y=1 is 0.2 I have no priors but I do know that if I say Y=1, I will get $10 if I am right and lose $2 if I am
wrong. I have no profit or loss if I say Y=0. What is my expected profit ___________ for saying that Y=1
for all observations falling into this leaf?
What should be my decision (Y=1 or Y=0) for this leaf if I am maximizing profit?
2. (9 pts.) Here are 6 (X,Y) data points in 2 clusters along with the centroids (cluster centers). No
centering or scaling is required, just take these data points as they stand. Also shown is the within
cluster sum of squares (sum of squared distances from the centroid) for cluster 2.
Cluster 1: (4,1) ( 2, 1), (3,4 )
Cluster 2: (-2,-1) (0,-3) (-1,-2)
Centroid
Sum of Squares
(3,2)
(-1, -2)
____________
4
A. Compute the within cluster sum of squares for cluster 1.
B. Compute the R-square for this 2 cluster solution _______________
C. Of these clustering methods, which makes the most use of this sum of squares idea:
Ward’s method, K-means, Single linkage, Average linkage, Complete linkage.
3. (30 pts.) I have a list of 50 logits from a logistic regression in decreasing order. The model predicts the
probability that Y=1. No logit appears more than once. The actual values for Y are listed alongside the
logits. Here are the first 12 rows of that list with a summary of the other 38 rows:
Logit Y
5.0
1
4.8
0
4.7
1
4.3
1
3.0
1
2.7
0
2.5
1
2.3
0
2.2
1
2.0
1
1.7
0
1.4
0
**** 38 more logits, 25 target values with Y=0, and 13 target values with Y=1.
A. Find the lift _______ at depth 10
B. I decide that Y=1 whenever the logit is 2 or more, and thus that Y= 0 if the logit is strictly less than 2.
Find
(i) the horizontal _________ and vertical _________ axis coordinates for the point on the ROC
curve corresponding to this decision.
(ii) the number _______ of concordant pairs for this decision
(iv) the proportion __________ of concordant pairs for this decision
4. (12 pts.) I have an association analysis for two items A and B (each of which occurs more than once in
my data set of course). Here are some rules and their properties from the data set:
Rule:
A => B
B => A
Confidence
.8
.5
Lift
Suppport
1.2 ________
____ ________
Fill in the blanks. (hint: use the probability definitions of terms)
5.(12 pts.) I have a tree with just 2 leaves and hence an ROC curve with just one change in slope. That
change occurs where the specificity is 0.6.
A. What is the highest possible area under such an ROC curve _______ ?
B. What would the corresponding sensitivity have to be _______ to make the total area under the ROC
curve 0.60?
6. (9 pts.) In my neural network my first hidden unit H1 is the hyperbolic tangent of 2 + 3X1 – 4 X2 where
X1 and X2 are my (only) inputs.
Find, if possible, values of X1 ____ and X2 ____ that make H1 equal to 1.2. If not possible, explain.
7. (10 pts.) In the context of principal components, we talked about vectors being orthogonal to each
other. Fill in the blank so that these two vectors are orthogonal.
 3
 
 4 
 2
 
1
 2 


 1 
 8 


 ____ 
Solutions:
1. Expected profit = $10(0.2)-$2(0.8) = $0.40>0 so decide Y=1.
2. Sum of squares (by Pythagoras’ theorem) is (4-3)2+(2-3)2+0+(1-2)2+1+4=8
R-square is 1-(4+8)/Total SS. Overall centroid is (1,0) (average of 6 X and 6 Y coordinates) so Total SS =
9+1+4+9+1+4 (using X) plus 1+1+16+1+9+4 (using Y) so 28 + 32=60 and R-square=1-12/60 =
4/5=80%.
Ward’s method agglomerates in a way that minimizes the resulting increase in within cluster sum of
Squares (in addition to using the cubic clustering
3. 10% of 50 is 5 so we are using 1 0 1 1 1, that is, we have Pr{1}=4/5. Overall we have 7+13=20 Y=1
values out of 50 so lift is 4/5 divided by 20/50 and so lift = 2 at depth 10.
For this decision, we have 7 1’s that we say are 1’s and 3 0’s that we accidentally say are 1’s. Overall we
have 20 1’s and 30 0’s. To the left of our cut point are 7/20 = 0.35 of the 1’s and 3/30 = 0.10 of the 0’s
so the point is (X,Y)=(0.10,0.35). We correctly identified 7 1’s and 27 0’s so this decision produces
7(27)=189 concordant pairs. There are 20(30)=600 pairs in all so 189/600 = 63/200 = 0.315 is the
proportion of concordant pairs for this decision.
4. We know that lift is the same, 1.2, for rule 2.
Use the definitions.
(a) Pr{A and B}/Pr{A} = 0.80, (b) Pr{A and B}/Pr{B} = 0.50 (c) Pr{A and B} / Pr{A}Pr{B} = 1.2.
These are 3 equations in 3 unknowns.
Taking (a)/(c)=Pr{B} =0 .8/1.2 = 2/3 and (b)/(c)=Pr{A}=0.5/1.2 = 5/12 we then find from (c) that support
= Pr{A and B} =1.2(5/12)(2/3) = 1/3. It is the same for both rules.
5. Draw a horizontal and a vertical line through the point of slope change. The ROC curve consists of a
rectangle of base 0.6 and height Y (the Y coordinate of our point), and to its left a triangle of base 0.4
and height Y, and above the rectangle a triangle of height (1-Y) and base 0.6. The area under the ROC
curve is the sum of the 3 areas, 0.6Y, 0.4Y/2 and 0.6(1-Y)/2 respectively. Area = 0.6Y +0.2Y +0.3 – 0.3Y =
0.5Y+.3 which is obviously a line that increases with Y. Of course Y cannot exceed 1 so 0.5(1)+.3=0.8 is
the maximum area. (This means that the best leaf has all 1’s.) To get 0.75 we need 0.5Y+.3=0.75 so
Y=0.9.
6. Not possible – hyperbolic tangents only extend between -1 and 1.
7. The sum of cross products must be 0 so -6 – 4 + 16 +1X=0 showing that X=-6.
Download