Rong Jin
Inefficiency with Bagging
•
Inefficient boostrap sampling:
Every example has equal chance to be sampled
• No distinction between “easy” examples and “difficult” examples
Inefficient model combination:
• A constant weight for each classifier
• No distinction between accurate classifiers and inaccurate classifiers
D 1 h
1
Bagging
D
Boostrap Sampling
D 2
… h
2
i
D k h k
Improve the Efficiency of Bagging
Better sampling strategy
• Focus on the examples that are difficult to classify
Better combination strategy
• Accurate model should be assigned larger weights
Intuition
Classifier1
+
Classifier2 +
Classifier3
X
1
Y
1
X
2
Y
2
X
3
Y
3
X
4
Y
4
X
1
Y
1
X
3
Y
3
X
1
Y
1
AdaBoost Algorithm
AdaBoost Example:
t
=ln2
Sample
D
0
: x
1
, y
1 x
2
, y
2
1/5 1/5 x
3
, y
3
1/5 x
4
, y
4
1/5 x
5
, y
5
1/5 x
1
, y
1 x
Training
3
, y
3 x
5
, y
5 h
1
D
1
: h
2
D
2
:
x
1
, y
1
x
2
, y
2
2/7 1/7
x
3
, y
3
1/7
x
4
, y
4
2/7
x
5
, y
5
1/7 x
1
, y
1
x
2
, y
2
2/9 1/9
x
3
, y
3
1/9
x
4
, y
4
4/9
x
5
, y
5
1/9
Update
Weights h
1
Sample
Update
Weights x
1
, y
1
Training x
3
, y
3 h
2
Sample …
How To Choose
t in AdaBoost?
How to construct the best distribution D t+1
(i)
1.
D t+1
(i) should be significantly different from D t
(i)
2.
D t+1
(i) should create a situation that classifier h t performs poorly
How To Choose
t in AdaBoost?
Optimization View for Choosing
t h t
(x): x
{1,-1}; a base (weak) classifier
H
T
(x): a linear combination of basic classifiers
Goal: minimize training error
Approximate error swith a exponential function
AdaBoost: Greedy Optimization
Fix H
T-1
(x), and solve h
T
(x) and
t
Empirical Study of AdaBoost
AdaBoosting decision trees
• Generate 50 decision trees by
AdaBoost
• Linearly combine decision trees using the weights of AdaBoost
In general:
• AdaBoost = Bagging > C4.5
• AdaBoost usually needs less number of classifiers than Bagging
Bia-Variance Tradeoff for AdaBoost
• AdaBoost can reduce both variance and bias simultaneously variance bias single decision tree
Bagging decision tree
AdaBoosting decision trees