Lift_2.docx

advertisement
Training Data
[4]
50%
2421/4843
4843(0.05) = 242.15
___|_____
44% [5]
[2] 1216/2112 = 57.6%
____|___
42% [6]
[1] 115/188 = 61.2%
___|____
[3]
[7]
836/2071 = 40.4%
53.8%
254/472
0
188
61.2%
115
188
|
|
5% 10%
242.15
2300
57.6%
1216
2112
…
2772
53.8%
254
472
|
|
45% 50%
2179.35 2421.5
4843
40.1%
836
2071
|
|
55% 60%
More Evaluations:
Decisions:
Y = Sensitivity: Probability of calling a 1 a 1. Pr{decide 1 |1}.Want large Y.
Specificity: Probability of not calling a 0 a 1.
X = 1-Specificity: Probability of mistakenly calling a 0 a 1.
Pr{Decide 1 | 0}. Want small X.
ROC Plot (receiver operating characteristic curve)
Declare all data 0 Sensitivity=0, 1-Specificity=0
(0,0)
------------------------------------------------------------------------------------------Cut at leaf 1:
Leaf 1 data declared 1’s (115 right, 188-115=73 wrong <– 0 called 1)
Sensitivity: 115/2421 = 0.0475
1-Specificity: 73/2422 = 0.0301
(0.0301, 0.0475)
------------------------------------------------------------------------------------------Cut at leaf 2:
Leaf 1 and 2 declared 1’s (1331 right, 2300-1331= 969 wrong <- 0 called 1)
Sensitivity: 1331/2421 = 0.5498
1-Specificity: 969/2422 = 0.4000
(0.4000, 0.5498)
-------------------------------------------------------------------------------------------Cut at leaf 3:
Sensitivity: 1585/2421 = 0.6547
1-Specificity: 1187/2422 = 0.4900
(0.4900 , 0.6547)
-------------------------------------------------------------------------------------------Declare all 1’s:
Sensitivity = 1, 1-Specificity = 1
(1,1)
Plot these 5 points, connect the dots – that’s the ROC curve!
You want to maximize area under ROC curve (maximizes the concordance)
You can plot ROC for a set of models even if one is regression, another a tree, etc.
Percent Captured Response:
There are 2421 1’s.
First 5% captures an estimated 146.177 of them: 146.177/2421 = 6.0379%
Next 5% captures an estimated 242.15(1216/2112)= 139.4197 of them: 5.7587%
Next few 5% sets capture 5.7587% as well (all in leaf 2)
SAS gives cumulative % captured 1’s
6.0379%, 6.0379%+5.7587%, 6.0379%+2(5.7587%), etc.
Ideal cumulative percent response, lift, etc. :
If you had a perfect predictor, it would correctly call all the 1’s and 0’s so the cumulative
percent response moving across the “deciles”, for example, would be cumulative percent
response = 100% until it got to the 50thpercentile (in our case) where it would start picking up 0’s
and decreasing from 100% towards 50%, hitting that overall 50% response rate in our data at the
100th percentile.
Download