Pens47_demo.docx

advertisement
Demo: Using the SAS Code and Score nodes.
Data: Coordinates of a pen (6 out of 16 coordinates) recorded as a person writes a digit (4 or 7
here) on a pressure sensitive pad. Coordinates have been normalized to have minimum 0 and
maximum 100.
(1) Bring in the data Pens47.
(2) Make a PENS diagram.
(3) Declare FOUR to be a binary target, DIGIT to be an ID.
(4) Make a 50:50:0 data split and run a default decision tree. View the results. Check the
Variable Importance (view->model-> variable importance).
(5) Bring in a Code Node from the Utilities tab. Select the Code Editor ellipsis (…). Look at the
subtabs and select Macro Variables then scroll down to Exports. Below that, mouse over
EM_EXPORT_SCORE. It creates a Score type of data set.
Type this in the code editor to get a grid of points to score.
Data &EM_EXPORT_SCORE;
do X5 = 0 to 100 by 5;
do X6=0 to 100 by 5;
output;
end; end;
proc print; run;
Why did we look at variable importance in step 4?
Select run  run Node
(not Code) and view results. Close the results window.
(6) Bring in a Score node from the Assess subtab. Connect the Decision Tree and SAS Code
nodes to that node.
Run the Score Code node then select Exported Data  ScoreExplore from its properties panel
to make some graphs.
Graph 1: scatter plot X5 = X, X6=Y, P_Four=color
Graph2: 3-D plot
X5=X,
X6=Y,
P_Four_1=Z,
P_Four_0=color
Inside the plot, right click then Action Mode  Rotate to rotate the plot with your cursor.
** Notes: With 3 or more nominal levels, the decision is to go with the level with the highest
probability. In lift charts for such cases, the lift looks at the ability to distinguish the nominal
(character) variable highest in sort order from the rest. Also, with 3 or more nominal levels, the
average squared error is computed like this:
1
Digit
1
4
1
1
7
4
4
7
Probability
---------Errors ---------
Sum of Squares
p1
p4
p7
e1
e4
e7
ssq
0.9
0.6
0.4
0.8
0.2
0.1
0.1
0.1
0.3
0.1
0.1
0.6
0.0
0.3
0.3
0.1
0.7
0.3
0.1
-0.6
0.6
0.2
-0.2
-0.1
-0.1
0.9
-0.3
-0.1
-0.1
0.4
0.0
-0.3
-0.3
-0.1
0.3
-0.3
0.02
1.26
0.54
0.06
0.14
0.26
====
2.28
Average Squared error is 2.28/18.
Download