Variable Importance (1) Agreement of surrogate and primary rules

advertisement
Variable Importance
(1) Agreement of surrogate and primary rules.
A surrogate rule is a backup for the primary splitting rule. It tries to do better than putting all
the missing values in the same node by looking to see where the surrogate rule (one that uses a
different variable) would put the data point when the variable used in the primary splitting rule has a
missing value. If the surrogate is also missing we use the primary missing value branch or go to the next
surrogate down in order of agreement. The agreement is the proportion of observations for which the
primary and a surrogate agree. Details on what “agreement” means can be found in the online help
under the decision trees topic, missing values subtopic. Under the decision trees topic you will also find
the computations for variable importance laid out. The computation may, optionally, take into account
surrogate rules.
(2) Error sum of squares in node .
For J categories (J=2 for binary) and N observations in the node, suppose pj is the probability of
being in class j, j=1,2,…,J. For each observation in the node let Yj = 1 if it is in class j and Yj = 0
otherwise. The errors for each observation are Y – pj which is 1-pj for the category j of the actual
observation and 0-pj for the other(s). Example: Three colors (R, G, B) and 4 observations.
Observations 1 and 2 are from a node that predicts R and observations 3 and 4 are from a node
that predicts G.
Observation
1
2
3
4
G
R
G
B
Pr{R}
Pr{G}
.6
.8
.2
.4
.3
.1
.7
.5
Pr{B} Decision
.1
.1
.1
.2
Error SSq
R
R
G
G
.36+.49+.01=0.86
.04+.01+.01=0.06
.04+.09+.01=0.14
.16+.25+.64=1.05
----2.11
If the nodes are leaves then these 4 observations contribute 2.11 to the model’s overall error
sum of squares.
(3) Reduction in error sum of squares from splitting node .
Suppose a node with error sum of squares 1.30 is split into two child nodes with error sums of
squares 0.32 and 0.18. The reduction in error sum of squares from that split is 1.30 – 0.32-0.18
=0.80. This reduction is associated with the variable used to do the split.
(4) Variable Importance in a tree.
Look at one variable. Compute the reduction in error sum of squares for each split that uses that
variable. Sum all of these over the whole tree or subtree then take the square root.
If you want to include cases where this variable is a surrogate, then multiply the reduction in
sum of squares for each split by the agreement amount (1 if it is a primary split). This is done
over the non-leaf nodes (of course – the leaves are not split!)
The importance is often reported as relative variable importance by dividing each variable’s
importance number by that of the most important variable.
Download