Ensemble Learning in Data Mining

Example: Weather Forecast (Two heads are better than one) Reality 2 X X X 3 4 5 X X X X X X 1 X X X X Combine Picture Source: Carla Gomez Majority Vote Model • Majority vote – Choose the class predicted by more than ½ the classifiers – If no agreement return an error – Why and when does this work? Example • Majority vote • Suppose we have 5 completely independent classifiers… – If accuracy is 70% for each classifier • (.35)+5(.34)(.7)+ 10 (.33)(.72) • 16.3% majority vote classifier is in error – 101 such classifiers • 0.1% majority vote classifier is in error Example • Majority vote • Suppose we have 5 completely independent classifiers… – If accuracy is 70% for each classifier • (.35)+5(.34)(.7)+ 10 (.33)(.72) • 16.3% majority vote classifier is in error – 101 such classifiers • 0.1% majority vote classifier is in error – what happens if p<50%? Try 30%? Majority Vote Model • Let p be the probability that a classifier makes an error • Assume that classifier errors are independent • The probability that k of the n classifiers make an error is: " n% $ ' p k (1 − p) n− k $ ' #k & • Therefore the probability that a majority vote classifier € is: is in error # n& n k n− k % ( p (1 − p) ∑% ( k> n / 2 k $ ' Value of Ensembles • “No Free Lunch” Theorem – No single algorithm wins all the time! • When combing multiple independent decisions, each of which is at least more accurate than random guessing, random errors cancel each other out and correct decisions are reinforced • Human ensembles are demonstrably better – How many jelly beans are in the jar? – Who Wants to be a Millionaire: “Ask the audience” What is Ensemble Learning? • Ensemble: collection of base learners – Each learns the target function – Combine their outputs for a final prediction – Often called “meta-learning” • How can you get different learners? • How can you combine learners? Ensemble Method1: Bagging • Create ensembles by “bootstrap aggregation”, i.e., repeatedly randomly re-sampling training data • Bootstrap: draw n items from X with replacement Given a training set X of m instances For i = 1..T Draw sample of size n < m from X uniformly w/ replacement Learn classifier Ci from sample i Final classifier is an unweighted vote of C1 .. CT Will Bagging Improve Accuracy? • Depends on the stability of the base classifiers – If small changes in the sample cause small changes in the base-level classifier, then the ensemble will not be much better than the base classifiers – If small changes in the sample cause large changes and the error is < ½ then we will see a big improvement Ensemble Method 2: Boosting: • Key idea: Instead of sampling (as in bagging) re-weigh examples Let m be the number of learners to generate Initialize all training instances to have the same weight for i=1,m generate learner hi increase weights of the training instances that hi misclassifies Final classifier is a weighted vote of all m learners (where the weights are set based on training set accuracy) Adaptive Boosting • Each rectangle corresponds to an example, with weight proportional to its height ✓ ✗ ✓ ✗ ✓ ✓ ✗ ✗ ✓ ✓ ✓ ✗ ê ê ê h1 h2 h3 Adaboost ( x1, x1 ... x m , y m ,T, H) D1 = ( m1 ,..., m1 ) T is the number of iterations m is the number of instances H is the classifier algorithm for t = 1,T € ht = H(x, y, Dt ) εt = ∑ € i:h t ( x i )≠ y i Dt (i) if εt ≥ 12 then T := t −1 break α t = 12 ln 1−εε for i = 1, m € t t € €if h (x ) ≠ y then D (i) := t i i t +1 € else Dt +1 (i) := € €end D t ( i)e −α Zt end € (h1,..., ht ,α1,...,α t , t) return € t D t ( i)e α Zt t € Example • Let m =20, then D1 = (.05,...,.05) • Imagine that h1 is correct on 15 and incorrect on 5 instances, then ε = 0.25 and α = 0.5493 € • We reweight as follows: –€ Correct Instances: D2 (i) = (.05e−.54993 ) / Z1 = 0.0289 /.8665 = .0333 – Incorrect Instances: D2 (i) = (.05e.54993 ) / Z1 = 0.0866 /.8665 = .0999 • Note that after normalization € € ∑ D (i) = 1 2 Summary of Boosting and Bagging • Called “homogenous ensembles” • Both use a single learning algorithm but manipulate training data to learn multiple models – Data1 ¹ Data2 ¹ … ¹ Data T – Learner1 = Learner2 = … = Learner T • Methods for changing training data: – Bagging: Resample training data – Boosting: Reweight training data Strong and Weak Learners • “Strong learner” produces a classifier which can be arbitrarily accurate. • “Weak Learner” produces a classifier more accurate than random guessing. • Goal: Create a single strong learner from a set of weak learners. What is Ensemble Learning? • Ensemble: collection of base learners – Each learns the target function – Combine their outputs for a final predication – Often called “meta-learning” • How can you get different learners? • How can you combine learners? Where do Learners come from? • Bagging • Boosting • Partitioning the data (must have a large amount) • Using different – feature subsets – algorithms – parameters of the same algorithm Ensemble Method 3: Random Forests • For i = 1 to T, – Take a bootstrap sample (bag) – Grow a random decision tree T_i • At each node choose a feature from one of n features (n < total number of features) • Grow a full tree (do not prune) • Classify new objects by taking a majority vote of the T random trees What is Ensemble Learning? • Ensemble: collection of base learners – Each learns the target function – Combine their outputs for a final predication – Often called “meta-learning” • How can you get different learners? • How can you combine learners? Methods for Combining Classifiers • Unweighted vote • Weighted vote (typically a function of the accuracy) • Stacking – learning how to combine classifiers Introduction to Machine Learning and Data Mining, Carla Brodley !"#$ %&'$( )$*+$,-"*,+ ./+*0$ !"#$%&'()&'*"#+, ! =9+!*+,-./0!>7/?+!:@A89,!,@!:AB:,C3,/C..D /EF7@;+!,9+!C44A7C4D!@-!F7+</4,/@3:!CB@A, 9@G!EA49!:@E+@3+!/:!8@/38!,@!+3H@D!C E@;/+!BC:+<!@3!,9+/7!E@;/+!F7+-+7+34+:5 I3!J+F,+EB+7!($1!())%!G+!CGC7<+<!,9+ K$L!M7C3<!>7/?+!,@!,+CE!NO+..P@7Q: >7C8EC,/4!R9C@:S5!T+C<!CB@A,!,9+/7 C.8@7/,9E1!49+4U@A,!,+CE!:4@7+:!@3!,9+ V+C<+7B@C7<1!C3<!H@/3!,9+!</:4A::/@3:!@3 ,9+!W@7AE5 ! X+!CFF.CA<!C..!,9+!4@3,7/BA,@7:!,@!,9/: YA+:,1!G9/49!/EF7@;+:!@A7!CB/./,D!,@ 4@33+4,!F+@F.+!,@!,9+!E@;/+:!,9+D!.@;+5 ! 123 ! " ! 1",&# ! " ! 4$05'678!"#$ #!$%%&'())%!*+,-./01!2345!6..!7/89,:!7+:+7;+<5 Began October 2006 • Supervised learning task – Training data is a set of users and ratings (1,2,3,4,5 stars) those users have given to movies. – Construct a classifier that given a user and an unrated movie, correctly classifies that movie as either 1, 2, 3, 4, or 5 stars • $1 million prize for a 10% improvement over Netflix’s current movie recommender “Our final solution (RMSE=0.8712) consists of blending 107 individual results. ” !"#$ . Ensemble methods are the best performers… %&'$( )$*+$,-"*,+ ./+*0$ !"#$"%&'#%$ !"#$%&'()*+,(!-#.*/!"#$%&!'()(!*+!,'+-!./$0!,%+)( 0%+1234(,#1( !" (2*35*.+/ ! !"#$ %&"'()"'& *&+,(%&+,(-./0& 1(2'30/4&'&#, *&+,(-56'7,(%7'& !"#$%&'"()*&+&,-./&0&123456&+&7($$($8&9*#:;&<*==>?"@A&'"#8:#B(C&DE#?A 1 !!!!!2(##3+)4,!5)6786*$%!"'6+, 9:;<=> 19:9= ?99@A9>A?=!1;B1;B?; ? !!!!!C'(!DE,(8F#( 9:;<=> 19:9= ?99@A9>A?=!1;BG;B?? G !!!!!H)6EI!5)$0(!C(68 9:;<;? @:@9 ?99@A9>A19!?1B?JBJ9 J !!!!!KL()6!M+#/*$+E,!6EI!N6EI(#6O!PE$*(I 9:;<;; @:;J ?99@A9>A19!91B1?BG1 < !!!!!N6EI(#6O!QEI/,*)$(,!R 9:;<@1 @:;1 ?99@A9>A19!99BG?B?9 = !!!!!5)6786*$%C'(+)O 9:;<@J @:>> ?99@A9=A?J!1?B9=B<= > !!!!!2(##3+)!$E!2$7"'6+, 9:;=91 @:>9 ?99@A9<A1G!9;B1JB9@ ; !!!!!S6%(T 9:;=1? @:<@ ?99@A9>A?J!1>B1;BJG @ !!!!!U((I,? 9:;=?? @:J; ?99@A9>A1?!1GB11B<1 19 !!!!!2$7"'6+, 9:;=?G @:J> ?99@A9JA9>!1?BGGB<@ 11 !!!!!KL()6!M+#/*$+E, 9:;=?G @:J> ?99@A9>A?J!99BGJB9> 1? !!!!!2(##3+) 9:;=?J @:J= ?99@A9>A?=!1>B1@B11 '"?8"*AA&'"()*&F113&+&,-./&0&1235F6&+&7($$($8&9*#:;&<*==>?"&($&<(8DE#?A

Ensemble Learning in Data Mining

Related documents

Products

Support

Ensemble Learning in Data Mining

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib