Classification And Regression Trees Stat 430

advertisement
Classification And
Regression Trees
Stat 430
Outline
• Tree Algorithm: recursive partitioning
• Idea of Random Forests
Delayed?
Distance< 2459
|
Distance>=728
0.03486
Distance>=4228
0.05107
0
0.5
Trees
• Breiman & Olsen (1984)
• Situation:
Categorical Response Y
Set of explanatory variables X1, ..., Xp
• Goal: find set of rules in X , ..., X
1
allows prediction of Y
p
that
Construction
• Starting with the root, find best split at each
node using an exhaustive search,
i.e. for each variable Xi generate all possible
splits and compute homogeneity, select best
split for best variable
rpart
• fit the tree using command rpart
• plot(tree)
text(tree)
• summary(tree)
• predict
• re-fit using other parameters:
control=rpart.control(cp=0.001)
Random Forests
• Breiman (2001), Breiman & Cutler (2004)
• Tree Ensemble built by randomly sampling
cases and variables
• Each case classified once for each tree in
the ensemble
RandomForest
• randomForest to fit the forest
• predict(forest, type=”prob”, newdata=test)
Combining Results
• Two predictions for test data:
Y1 and Y2
• with corresponding MSE and MSE value
• What combination of Y and Y will give
1
1
best prediction?
aY1 + (1-a)Y2
2
2
Work on your Project
Download