Uploaded by yawek95214

Ensembles

advertisement
Example: Weather Forecast
(Two heads are better than one)
Reality
2
X
X
X
3
4
5
X
X
X
X
X
X
1
X
X
X
X
Combine
Picture Source: Carla Gomez
Majority Vote Model
• Majority vote
– Choose the class predicted by more than ½
the classifiers
– If no agreement return an error
– Why and when does this work?
Example
• Majority vote
• Suppose we have 5 completely independent
classifiers…
– If accuracy is 70% for each classifier
• (.35)+5(.34)(.7)+ 10 (.33)(.72)
• 16.3% majority vote classifier is in error
– 101 such classifiers
• 0.1% majority vote classifier is in error
Example
• Majority vote
• Suppose we have 5 completely independent
classifiers…
– If accuracy is 70% for each classifier
• (.35)+5(.34)(.7)+ 10 (.33)(.72)
• 16.3% majority vote classifier is in error
– 101 such classifiers
• 0.1% majority vote classifier is in error
– what happens if p<50%? Try 30%?
Majority Vote Model
• Let p be the probability that a classifier makes an error
• Assume that classifier errors are independent
• The probability that k of the n classifiers make an error
is:
" n%
$ ' p k (1 − p) n− k
$ '
#k &
• Therefore the probability that a majority vote classifier
€ is:
is in error
# n&
n
k
n− k
%
(
p
(1
−
p)
∑% (
k> n / 2 k
$ '
Value of Ensembles
• “No Free Lunch” Theorem
– No single algorithm wins all the time!
• When combing multiple independent decisions, each
of which is at least more accurate than random
guessing, random errors cancel each other out and
correct decisions are reinforced
• Human ensembles are demonstrably better
– How many jelly beans are in the jar?
– Who Wants to be a Millionaire: “Ask the
audience”
What is Ensemble Learning?
• Ensemble: collection of base learners
– Each learns the target function
– Combine their outputs for a final prediction
– Often called “meta-learning”
• How can you get different learners?
• How can you combine learners?
Ensemble Method1: Bagging
• Create ensembles by “bootstrap aggregation”, i.e.,
repeatedly randomly re-sampling training data
• Bootstrap: draw n items from X with replacement
Given a training set X of m instances
For i = 1..T
Draw sample of size n < m from X uniformly w/
replacement
Learn classifier Ci from sample i
Final classifier is an unweighted vote of C1 .. CT
Will Bagging Improve Accuracy?
• Depends on the stability of the base
classifiers
– If small changes in the sample cause small
changes in the base-level classifier, then
the ensemble will not be much better than
the base classifiers
– If small changes in the sample cause large
changes and the error is < ½ then we will
see a big improvement
Ensemble Method 2: Boosting:
• Key idea: Instead of sampling (as in bagging)
re-weigh examples
Let m be the number of learners to generate
Initialize all training instances to have the same weight
for i=1,m
generate learner hi
increase weights of the training instances that hi
misclassifies
Final classifier is a weighted vote of all m learners
(where the weights are set based on training set
accuracy)
Adaptive Boosting
• Each rectangle
corresponds to an
example, with weight
proportional to its
height
✓
✗
✓
✗
✓
✓
✗
✗
✓
✓
✓
✗
ê
ê
ê
h1
h2
h3
Adaboost ( x1, x1 ... x m , y m ,T, H)
D1 = ( m1 ,..., m1 )
T is the number of iterations
m is the number of instances
H is the classifier algorithm
for t = 1,T
€
ht = H(x, y, Dt )
εt = ∑
€
i:h t ( x i )≠ y i
Dt (i)
if εt ≥ 12 then T := t −1 break
α t = 12 ln 1−εε
for i = 1, m
€
t
t
€
€if h (x ) ≠ y then D (i) :=
t
i
i
t +1
€
else Dt +1 (i) :=
€
€end
D t ( i)e −α
Zt
end
€
(h1,..., ht ,α1,...,α t , t)
return
€
t
D t ( i)e α
Zt
t
€
Example
• Let m =20, then D1 = (.05,...,.05)
• Imagine that h1 is correct on 15 and incorrect on 5
instances, then ε = 0.25 and α = 0.5493
€
• We reweight
as follows:
–€
Correct Instances:
D2 (i) = (.05e−.54993 ) / Z1 = 0.0289 /.8665 = .0333
– Incorrect Instances:
D2 (i) = (.05e.54993 ) / Z1 = 0.0866 /.8665 = .0999
• Note that after normalization
€
€
∑ D (i) = 1
2
Summary of Boosting and Bagging
• Called “homogenous ensembles”
• Both use a single learning algorithm but
manipulate training data to learn multiple models
– Data1 ¹ Data2 ¹ … ¹ Data T
– Learner1 = Learner2 = … = Learner T
• Methods for changing training data:
– Bagging: Resample training data
– Boosting: Reweight training data
Strong and Weak Learners
• “Strong learner” produces a classifier
which can be arbitrarily accurate.
• “Weak Learner” produces a classifier
more accurate than random guessing.
• Goal: Create a single strong learner
from a set of weak learners.
What is Ensemble Learning?
• Ensemble: collection of base learners
– Each learns the target function
– Combine their outputs for a final predication
– Often called “meta-learning”
• How can you get different learners?
• How can you combine learners?
Where do Learners come from?
• Bagging
• Boosting
• Partitioning the data (must have a large
amount)
• Using different
– feature subsets
– algorithms
– parameters of the same algorithm
Ensemble Method 3: Random Forests
• For i = 1 to T,
– Take a bootstrap sample (bag)
– Grow a random decision tree T_i
• At each node choose a feature from one of n
features (n < total number of features)
• Grow a full tree (do not prune)
• Classify new objects by taking a
majority vote of the T random trees
What is Ensemble Learning?
• Ensemble: collection of base learners
– Each learns the target function
– Combine their outputs for a final predication
– Often called “meta-learning”
• How can you get different learners?
• How can you combine learners?
Methods for Combining Classifiers
• Unweighted vote
• Weighted vote (typically a function of the
accuracy)
• Stacking – learning how to combine classifiers
Introduction to Machine Learning and Data Mining, Carla Brodley
!"#$
%&'$(
)$*+$,-"*,+
./+*0$
!"#$%&'()&'*"#+,
!
=9+!*+,-./0!>7/?+!:@A89,!,@!:AB:,C3,/C..D
/EF7@;+!,9+!C44A7C4D!@-!F7+</4,/@3:!CB@A,
9@G!EA49!:@E+@3+!/:!8@/38!,@!+3H@D!C
E@;/+!BC:+<!@3!,9+/7!E@;/+!F7+-+7+34+:5
I3!J+F,+EB+7!($1!())%!G+!CGC7<+<!,9+
K$L!M7C3<!>7/?+!,@!,+CE!NO+..P@7Q:
>7C8EC,/4!R9C@:S5!T+C<!CB@A,!,9+/7
C.8@7/,9E1!49+4U@A,!,+CE!:4@7+:!@3!,9+
V+C<+7B@C7<1!C3<!H@/3!,9+!</:4A::/@3:!@3
,9+!W@7AE5
!
X+!CFF.CA<!C..!,9+!4@3,7/BA,@7:!,@!,9/:
YA+:,1!G9/49!/EF7@;+:!@A7!CB/./,D!,@
4@33+4,!F+@F.+!,@!,9+!E@;/+:!,9+D!.@;+5
!
123 ! " ! 1",&# ! " ! 4$05'678!"#$
#!$%%&'())%!*+,-./01!2345!6..!7/89,:!7+:+7;+<5
Began October 2006
• Supervised learning task
– Training data is a set of users and ratings
(1,2,3,4,5 stars) those users have given to
movies.
– Construct a classifier that given a user and an
unrated movie, correctly classifies that movie
as either 1, 2, 3, 4, or 5 stars
• $1 million prize for a 10% improvement
over Netflix’s current movie
recommender
“Our final
solution
(RMSE=0.8712)
consists of
blending
107 individual
results. ”
!"#$
.
Ensemble methods
are the best
performers…
%&'$(
)$*+$,-"*,+
./+*0$
!"#$"%&'#%$
!"#$%&'()*+,(!-#.*/!"#$%&!'()(!*+!,'+-!./$0!,%+)(
0%+1234(,#1(
!"
(2*35*.+/
!
!"#$
%&"'()"'&
*&+,(%&+,(-./0& 1(2'30/4&'&#, *&+,(-56'7,(%7'&
!"#$%&'"()*&+&,-./&0&123456&+&7($$($8&9*#:;&<*==>?"@A&'"#8:#B(C&DE#?A
1
!!!!!2(##3+)4,!5)6786*$%!"'6+,
9:;<=>
19:9=
?99@A9>A?=!1;B1;B?;
?
!!!!!C'(!DE,(8F#(
9:;<=>
19:9=
?99@A9>A?=!1;BG;B??
G
!!!!!H)6EI!5)$0(!C(68
9:;<;?
@:@9
?99@A9>A19!?1B?JBJ9
J
!!!!!KL()6!M+#/*$+E,!6EI!N6EI(#6O!PE$*(I
9:;<;;
@:;J
?99@A9>A19!91B1?BG1
<
!!!!!N6EI(#6O!QEI/,*)$(,!R
9:;<@1
@:;1
?99@A9>A19!99BG?B?9
=
!!!!!5)6786*$%C'(+)O
9:;<@J
@:>>
?99@A9=A?J!1?B9=B<=
>
!!!!!2(##3+)!$E!2$7"'6+,
9:;=91
@:>9
?99@A9<A1G!9;B1JB9@
;
!!!!!S6%(T
9:;=1?
@:<@
?99@A9>A?J!1>B1;BJG
@
!!!!!U((I,?
9:;=??
@:J;
?99@A9>A1?!1GB11B<1
19
!!!!!2$7"'6+,
9:;=?G
@:J>
?99@A9JA9>!1?BGGB<@
11
!!!!!KL()6!M+#/*$+E,
9:;=?G
@:J>
?99@A9>A?J!99BGJB9>
1?
!!!!!2(##3+)
9:;=?J
@:J=
?99@A9>A?=!1>B1@B11
'"?8"*AA&'"()*&F113&+&,-./&0&1235F6&+&7($$($8&9*#:;&<*==>?"&($&<(8DE#?A
Download