Nearest neighbor. (1) Run Pensall_Logistic.sas to see how we do classifying the digits with a cumulative logit model. Now try using only X1 and X5 as features. (2) In SAS, plot X1 vs. X5 using the digit as a plot symbol. (3) Bring the Pensall data in as a data source. Use only X1 and X5 as inputs, digit as the target, and rejecting the rest of the variables. Give digit a level of nominal (otherwise you may predict a 7.125 for example). Right click the data source Explore graph X1 vs X5 using digit as a color. Make a bar chart of the digits. Tile the graph windows. Mark a bar and see where the X1, X5 coordinates are on the other plot . (4) Connect to a Memory Based Reasoning node. Run the node and view the results. What is the misclassification rate? Look at the exported data. Graph the digit versus P_digt0. With what digit does 0 seem most likely to be confused? Again click the graphing icon, this time choosing 3D plots. Pick the middle of the 3 icons that appear. Use digit as the category, P_digit0 as the series. Interpret the plot. Make a plot with the same 3D Chart tool using the from digit F_digit as a category and the into digit I_digit as the into category. Did the method work well? (5) Create an X1, X5 grid with a SAS Code Node: data &em_lib..emcode_score; ; do x1=0 to 100; do x5=0 to 100; output; end; end; proc print data= &em_lib..emcode_score(obs=20); run; (6) Connect the memory based reason and SAS code nodes to a score node and run. Now explore the exported data, making a scatter plot of X1 by X5 and using EM_classification as a group variable. You can click on the legend to highlight the areas corresponding to each prediction. The ability to make a graph like this is the reason I chose to use only two X variables.