Classification And Regression Trees Stat 430 Outline • Tree Algorithm: recursive partitioning • Idea of Random Forests Delayed? Distance< 2459 | Distance>=728 0.03486 Distance>=4228 0.05107 0 0.5 Trees • Breiman & Olsen (1984) • Situation: Categorical Response Y Set of explanatory variables X1, ..., Xp • Goal: find set of rules in X , ..., X 1 allows prediction of Y p that Construction • Starting with the root, find best split at each node using an exhaustive search, i.e. for each variable Xi generate all possible splits and compute homogeneity, select best split for best variable rpart • fit the tree using command rpart • plot(tree) text(tree) • summary(tree) • predict • re-fit using other parameters: control=rpart.control(cp=0.001) Random Forests • Breiman (2001), Breiman & Cutler (2004) • Tree Ensemble built by randomly sampling cases and variables • Each case classified once for each tree in the ensemble RandomForest • randomForest to fit the forest • predict(forest, type=”prob”, newdata=test) Combining Results • Two predictions for test data: Y1 and Y2 • with corresponding MSE and MSE value • What combination of Y and Y will give 1 1 best prediction? aY1 + (1-a)Y2 2 2 Work on your Project