Boston Housing Data.doc

advertisement
Boston Housing Data




Regression Tree
CART: Classification and Regression Trees
Target is CONTINUOUS
Split based on F statistic P-value
NOx
NOx
Low
High
House Value
N = n1 + n2 obs.
Y1, Y2, Y3, …. Yn1
SSE(1)
Y n1+1, Y n1+2, Y n1+3, …. YN SSE(2)
[SS(total) –SSE(1)-SSE(2) ] / 1df = F numerator
MSE = [SSE(1) + SSE(2)] / (N-2)df = F denominator
p-value = Pr>F.
(# possible splits)(p-value) = Kass adjusted p-value
-Log10 [(# possible splits)(p-value) ] = logworth of split
Keep on splitting as usual.
(1) Pull in HOUSING data from our AAEMDATA library.
(2) Use median house value as the target, NOx (environment) and RM (avg. # rooms in
houses) as inputs. Reject everything else. Explore the variables RM and NOx to get their
range.
(3) (optionally split into training and validation) Create a new diagram.
(4) Drag in a tree node and connect.
(5) (optional) Make a grid – put this in a code node and run it (where did that funky
name come from?).
Data &EM_LIB..EMCODE_SCORE; do NOx = .35 to .9 by .025;
DO RM = 3.5 to 9 by .25; output; end; end;
proc print; run;
(6) From the ASSESS subtab, drag in a score node and connect the tree and code nodes to
it. Update and run. From the properties menu, select Exported data… then select the
SCORE data set and click on Explore at the bottom. Use the graphing icon to make a 3D plot of P_MEDV (Y) versus RM and NOx. Use _LEAF_ as a color variable. What
kind of predictions do you see?
Boston Housing II
Herein I describe how you can export the scoring code and use it within SAS (not EM) to
score another dataset that has the inputs and most likely does not have the target variable.
This means that anyone with SAS can score a data set with your code. Notice that the
code is created within EM so a person without EM cannot create a tree, they can just
score a data set using your tree.
(1) Click on the tree node. Select results.
(2) From the top menu bar select view-> scoring -> SAS code. The created code opens
in a window.
(3) Activate (click on top banner) the window containing the code. From the menu select
Edit->select all then Edit->copy.
(4) Get into SAS. You could be in VCL or you could launch SAS from your desktop or
in our case from the Novell launcher. Go to the program editor and paste the copied code
into it.
(5) Before the included code, type this:
Data score;
Do rm = 3.5 to 9 by 0.25;
Do NOx = 0.35 to 0.90 by 0.025;
(6) After the included code, type this:
output; end; end;
proc print data=score; run;
proc sort data=score; by _LEAF_;
proc means data=score;
var P_MEDV RM NOx;
by _LEAF_;
run;
(7) Make a rotating 3D plot in INSIGHT. Color it by P_MED_V.
Download