Document

advertisement

Boosting

Rong Jin

Inefficiency with Bagging

Inefficient boostrap sampling:

Every example has equal chance to be sampled

• No distinction between “easy” examples and “difficult” examples

Inefficient model combination:

• A constant weight for each classifier

• No distinction between accurate classifiers and inaccurate classifiers

D 1 h

1

Bagging

D

Boostrap Sampling

D 2

… h

2

 i

D k h k

Improve the Efficiency of Bagging

Better sampling strategy

• Focus on the examples that are difficult to classify

Better combination strategy

• Accurate model should be assigned larger weights

Intuition

Classifier1

+

Classifier2 +

Classifier3

X

1

Y

1

X

2

Y

2

X

3

Y

3

X

4

Y

4

X

1

Y

1

X

3

Y

3

X

1

Y

1

AdaBoost Algorithm

AdaBoost Example:

 t

=ln2

Sample

D

0

: x

1

, y

1 x

2

, y

2

1/5 1/5 x

3

, y

3

1/5 x

4

, y

4

1/5 x

5

, y

5

1/5 x

1

, y

1 x

Training

3

, y

3 x

5

, y

5 h

1

D

1

: h

2

D

2

:

 x

1

, y

1

 x

2

, y

2

2/7 1/7

 x

3

, y

3

1/7

 x

4

, y

4

2/7

 x

5

, y

5

1/7 x

1

, y

1

 x

2

, y

2

2/9 1/9

 x

3

, y

3

1/9

 x

4

, y

4

4/9

 x

5

, y

5

1/9

Update

Weights h

1

Sample

Update

Weights x

1

, y

1

Training x

3

, y

3 h

2

Sample …

How To Choose

 t in AdaBoost?

How to construct the best distribution D t+1

(i)

1.

D t+1

(i) should be significantly different from D t

(i)

2.

D t+1

(i) should create a situation that classifier h t performs poorly

How To Choose

 t in AdaBoost?

Optimization View for Choosing

 t h t

(x): x

{1,-1}; a base (weak) classifier

H

T

(x): a linear combination of basic classifiers

Goal: minimize training error

Approximate error swith a exponential function

AdaBoost: Greedy Optimization

Fix H

T-1

(x), and solve h

T

(x) and

 t

Empirical Study of AdaBoost

AdaBoosting decision trees

• Generate 50 decision trees by

AdaBoost

• Linearly combine decision trees using the weights of AdaBoost

In general:

• AdaBoost = Bagging > C4.5

• AdaBoost usually needs less number of classifiers than Bagging

Bia-Variance Tradeoff for AdaBoost

• AdaBoost can reduce both variance and bias simultaneously variance bias single decision tree

Bagging decision tree

AdaBoosting decision trees

Download