Nearest neighbor Demo.doc

advertisement
Nearest neighbor.
(1) Run Pensall_Logistic.sas to see how we do classifying the digits with a cumulative logit model. Now
try using only X1 and X5 as features.
(2) In SAS, plot X1 vs. X5 using the digit as a plot symbol.
(3) Bring the Pensall data in as a data source. Use only X1 and X5 as inputs, digit as the target, and
rejecting the rest of the variables. Give digit a level of nominal (otherwise you may predict a 7.125 for
example). Right click the data source  Explore graph X1 vs X5 using digit as a color. Make a bar chart
of the digits. Tile the graph windows. Mark a bar and see where the X1, X5 coordinates are on the other
plot .
(4) Connect to a Memory Based Reasoning node. Run the node and view the results. What is the
misclassification rate? Look at the exported data. Graph the digit versus P_digt0. With what digit does
0 seem most likely to be confused? Again click the graphing icon, this time choosing 3D plots. Pick the
middle of the 3 icons that appear. Use digit as the category, P_digit0 as the series. Interpret the plot.
Make a plot with the same 3D Chart tool using the from digit F_digit as a category and the into digit
I_digit as the into category. Did the method work well?
(5) Create an X1, X5 grid with a SAS Code Node:
data &em_lib..emcode_score; ;
do x1=0 to 100; do x5=0 to 100; output; end; end;
proc print data= &em_lib..emcode_score(obs=20); run;
(6) Connect the memory based reason and SAS code nodes to a score node and run. Now explore the
exported data, making a scatter plot of X1 by X5 and using EM_classification as a group variable. You can
click on the legend to highlight the areas corresponding to each prediction. The ability to make a graph
like this is the reason I chose to use only two X variables.
Download