Variable Importance (1) Agreement of surrogate and primary rules. A surrogate rule is a backup for the primary splitting rule. It tries to do better than putting all the missing values in the same node by looking to see where the surrogate rule (one that uses a different variable) would put the data point when the variable used in the primary splitting rule has a missing value. If the surrogate is also missing we use the primary missing value branch or go to the next surrogate down in order of agreement. The agreement is the proportion of observations for which the primary and a surrogate agree. Details on what “agreement” means can be found in the online help under the decision trees topic, missing values subtopic. Under the decision trees topic you will also find the computations for variable importance laid out. The computation may, optionally, take into account surrogate rules. (2) Error sum of squares in node . For J categories (J=2 for binary) and N observations in the node, suppose pj is the probability of being in class j, j=1,2,…,J. For each observation in the node let Yj = 1 if it is in class j and Yj = 0 otherwise. The errors for each observation are Y – pj which is 1-pj for the category j of the actual observation and 0-pj for the other(s). Example: Three colors (R, G, B) and 4 observations. Observations 1 and 2 are from a node that predicts R and observations 3 and 4 are from a node that predicts G. Observation 1 2 3 4 G R G B Pr{R} Pr{G} .6 .8 .2 .4 .3 .1 .7 .5 Pr{B} Decision .1 .1 .1 .2 Error SSq R R G G .36+.49+.01=0.86 .04+.01+.01=0.06 .04+.09+.01=0.14 .16+.25+.64=1.05 ----2.11 If the nodes are leaves then these 4 observations contribute 2.11 to the model’s overall error sum of squares. (3) Reduction in error sum of squares from splitting node . Suppose a node with error sum of squares 1.30 is split into two child nodes with error sums of squares 0.32 and 0.18. The reduction in error sum of squares from that split is 1.30 – 0.32-0.18 =0.80. This reduction is associated with the variable used to do the split. (4) Variable Importance in a tree. Look at one variable. Compute the reduction in error sum of squares for each split that uses that variable. Sum all of these over the whole tree or subtree then take the square root. If you want to include cases where this variable is a surrogate, then multiply the reduction in sum of squares for each split by the agreement amount (1 if it is a primary split). This is done over the non-leaf nodes (of course – the leaves are not split!) The importance is often reported as relative variable importance by dividing each variable’s importance number by that of the most important variable.