Boston Housing Data Regression Tree CART: Classification and Regression Trees Target is CONTINUOUS Split based on F statistic P-value NOx NOx Low High House Value N = n1 + n2 obs. Y1, Y2, Y3, …. Yn1 SSE(1) Y n1+1, Y n1+2, Y n1+3, …. YN SSE(2) [SS(total) –SSE(1)-SSE(2) ] / 1df = F numerator MSE = [SSE(1) + SSE(2)] / (N-2)df = F denominator p-value = Pr>F. (# possible splits)(p-value) = Kass adjusted p-value -Log10 [(# possible splits)(p-value) ] = logworth of split Keep on splitting as usual. (1) Pull in HOUSING data from our AAEMDATA library. (2) Use median house value as the target, NOx (environment) and RM (avg. # rooms in houses) as inputs. Reject everything else. Explore the variables RM and NOx to get their range. (3) (optionally split into training and validation) Create a new diagram. (4) Drag in a tree node and connect. (5) (optional) Make a grid – put this in a code node and run it (where did that funky name come from?). Data &EM_LIB..EMCODE_SCORE; do NOx = .35 to .9 by .025; DO RM = 3.5 to 9 by .25; output; end; end; proc print; run; (6) From the ASSESS subtab, drag in a score node and connect the tree and code nodes to it. Update and run. From the properties menu, select Exported data… then select the SCORE data set and click on Explore at the bottom. Use the graphing icon to make a 3D plot of P_MEDV (Y) versus RM and NOx. Use _LEAF_ as a color variable. What kind of predictions do you see? Boston Housing II Herein I describe how you can export the scoring code and use it within SAS (not EM) to score another dataset that has the inputs and most likely does not have the target variable. This means that anyone with SAS can score a data set with your code. Notice that the code is created within EM so a person without EM cannot create a tree, they can just score a data set using your tree. (1) Click on the tree node. Select results. (2) From the top menu bar select view-> scoring -> SAS code. The created code opens in a window. (3) Activate (click on top banner) the window containing the code. From the menu select Edit->select all then Edit->copy. (4) Get into SAS. You could be in VCL or you could launch SAS from your desktop or in our case from the Novell launcher. Go to the program editor and paste the copied code into it. (5) Before the included code, type this: Data score; Do rm = 3.5 to 9 by 0.25; Do NOx = 0.35 to 0.90 by 0.025; (6) After the included code, type this: output; end; end; proc print data=score; run; proc sort data=score; by _LEAF_; proc means data=score; var P_MEDV RM NOx; by _LEAF_; run; (7) Make a rotating 3D plot in INSIGHT. Color it by P_MED_V.