Are we still talking about diversity in classifier ensembles? Ludmila I Kuncheva School of Computer Science Bangor University, UK Are weCompletely still talkingirrelevant about diversity your Workshop... intoclassifier ensembles? Ludmila I Kuncheva School of Computer Science Bangor University, UK Let’s talk instead of: Multi-view and classifier ensembles A classifier ensemble class label “combiner” classifier classifier feature values (object description) classifier class label ensemble? classifier combiner classifier a neural network feature values (object description) class label a fancy combiner ensemble? classifier classifier classifier classifier classifier classifier classifier feature values (object description) a fancy feature extractor class label classifier? classifier “combiner” classifier feature values (object description) classifier Why classifier ensembles then? a. because we like to complicate entities beyond necessity (anti-Occam’s razor) b. because we are lazy and stupid and can’t be bothered to design and train one single sophisticated classifier c. because democracy is so important to our society, it must be important to classification combination of multiple classifiers [Lam95,Woods97,Xu92,Kittler98] classifier fusion [Cho95,Gader96,Grabisch92,Keller94,Bloch96] mixture of experts [Jacobs91,Jacobs95,Jordan95,Nowlan91] committees of neural networks [Bishop95,Drucker94] consensus aggregation [Benediktsson92,Ng92,Benediktsson97] voting pool of classifiers [Battiti94] dynamic classifier selection [Woods97] oldest composite classifier systems [Dasarathy78] classifier ensembles [Drucker94,Filippi94,Sharkey99] bagging, boosting, arcing, wagging [Sharkey99] oldest modular systems [Sharkey99] collective recognition [Rastrigin81,Barabash83] stacked generalization [Wolpert92] divide-and-conquer classifiers [Chiang94] pandemonium system of reflective agents [Smieja96] change-glasses approach to classifier selection [KunchevaPRL93] etc. combination of multiple classifiers [Lam95,Woods97,Xu92,Kittler98] classifier fusion [Cho95,Gader96,Grabisch92,Keller94,Bloch96] mixture of experts [Jacobs91,Jacobs95,Jordan95,Nowlan91] committees of neural networks [Bishop95,Drucker94] consensus aggregation [Benediktsson92,Ng92,Benediktsson97] voting pool of classifiers [Battiti94] dynamic classifier selection [Woods97] composite classifier systems [Dasarathy78] classifier ensembles [Drucker94,Filippi94,Sharkey99] Out of fashion bagging, boosting, arcing, wagging [Sharkey99] modular systems [Sharkey99] collective recognition [Rastrigin81,Barabash83] stacked generalization [Wolpert92] divide-and-conquer classifiers [Chiang94] Subsumed pandemonium system of reflective agents [Smieja96] change-glasses approach to classifier selection [KunchevaPRL93] etc. class label classifier ensemble combiner classifier classifier feature values (object description) classifier Congratulations! The Netflix Prize sought to substantially improve the accuracy of predictions about how much someone is going to enjoy a movie based on their movie preferences. On September 21, 2009 we awarded the $1M Grand Prize to team “BellKor’s Pragmatic Chaos”. Read about their algorithm, checkout team scores on the Leaderboard, and join the discussions on the Forum. We applaud all the contributors to this quest, which improves our ability to connect people to the movies they love. class label classifier ensemble combiner classifier classifier feature values (object description) classifier Classifier combination? Hmmmm….. David J. Hand (2006) Classifier technology and the illusion of progress, Statist. Sci. 21 (1), 1-14. We are kidding ourselves; there is no real progress in spite of ensemble methods. David Hand S. Dzeroski, and B. Zenko. (2004) Is combining classifiers better than selecting the best one? Machine Learning, 54, 255-273. Saso Dzeroski Chances are that the single best classifier will be better than the ensemble. Quo Vadis? "combining classifiers" OR "classifier combination" OR "classifier ensembles" OR "ensemble of classifiers" OR "combining multiple classifiers" OR "committee of classifiers" OR "classifier committee" OR "committees of neural networks" OR "consensus aggregation" OR "mixture of experts" OR "bagging predictors" OR adaboost OR (( "random subspace" OR "random forest" OR "rotation forest" OR boosting) AND "machine learning") Gartner’s Hype Cycle: a typical evolution pattern of a new technology peak of inflated expectations Where are we?... phoria na ive e u visibility asymptote of reality slope of enlightenment trough of disillusionment time top cited paper is from… PR IEEE TPAMI IEEE TPAMI JAE PPL PPL JTB CC application paper 0.3 IJCV 0.25 0.15 0.1 0 1990 IEEE TPAMI NN ML IEEE TPAMI IEEE TPAMI ML IEEE TPAMI ML JASA ML 0.2 IEEE TSMC per mil of published papers on classifier ensembles 0.35 0.05 (6) IEEE TPAMI = IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE TSMC = IEEE Transactions on Systems, Man and Cybernetics JASA = Journal of the American Statistical Association IJCV = International Journal of Computer Vision JTB = Journal of Theoretical Biology (2) PPL = Protein and Peptide Letters JAE = Journal of Animal Ecology PR = Pattern Recognition (4) ML = Machine Learning NN = Neural Networks CC = Cerebral Cortex 1995 2000 time 2005 2010 4500 [ML] Bagging predictors 4000 [ML] Random forests number of citations 3500 3000 2500 [IEEE TPAMI] On combining classifiers [IJCV] Robust real-time face detection 2000 1500 1000 500 0 1990 1992 1994 1996 1998 2000 2002 time 2004 2006 2008 2010 2012 International Workshop on Multiple Classifier Systems 2000 – 2013 - continuing Levels of questions A Combination level • selection or fusion? • voting or another combination method? • trainable or non-trainable combiner? Combiner Classifier 1 Classifier 2 Classifier level • same or different classifiers? • decision trees, neural networks or other? • how many? B … C D Features Data level • independent/dependent bootstrap samples? • selected data sets? Data set Classifier L Feature level • all features or subsets of features? • random or selected subsets? 50 diverse linear classifiers 50 non-diverse linear classifiers Strength of classifiers The perfect classifier Large ensemble of • 3-8 classifiers • heterogeneous • trained combiner (stacked generalisation) ? nearly identical classifiers - REDUNDANCY • • • • • 1 30-50 classifiers How about here? same or different models? trained or non-trained combiner? selection or fusion? IS IT WORTH IT? Number of classifiers L Must engineer diversity… Small ensembles of ? weak classifiers - INSUFFICIENCY • 100+ classifiers • same model • non-trained combiner (bagging, boosting, etc.) Strength of classifiers The perfect classifier Large ensemble of • 3-8 classifiers • heterogeneous • trained combiner (stacked generalisation) nearly identical classifiers - REDUNDANCY • • • • • 1 30-50 classifiers same or different models? trained or non-trained combiner? selection or fusion? IS IT WORTH IT? Number of classifiers L Must engineer diversity… Small ensembles of weak classifiers - INSUFFICIENCY • 100+ classifiers • same model • non-trained combiner (bagging, boosting, etc.) A classifier ensemble class label “combiner” classifier classifier feature values (object description) classifier one view A classifier ensemble class label “combiner” classifier feature values (object description) classifier feature values (object description) classifier feature values (object description) multiple views 1998 “distinct” is what you call “late fusion” “shared” is what you call “early fusion” EXPRESSION OF EMOTION - MODALITIES facial expression behavioural eye tracking physiological interaction with the computer gesture speech posture pressure on mouse drag-click speed dialogue with tutor central nervous system EEG peripheral nervous system fMRI fNIRS pulse rate EMG pulse variation respiration skin to Galvanic skin response blood pressure Data Classification Strategies modality 1 (1) Concatenate the features from all modalities “early fusion” (2) Feature extraction and concatenation “mid-fusion” modality 2 (3) Straight ensemble classification “late fusion” modality 3 ensemble And many combinations thereof... Data modality 1 Classification Strategies We capture all dependencies but can’t handle the complexity (1) Concatenate the features from all modalities “early fusion” (2) Feature extraction and concatenation “mid-fusion” modality 2 (3) Straight ensemble classification “late fusion” modality 3 ensemble We lose the dependencies but can handle the complexity Ensemble Feature Selection By the ensemble (RANKERS) Decision tree ensembles Ensembles of different rankers For the ensemble Bootstrap ensembles of rankers Uniform (Random subspace) Multiview late fusion Random approach Systematic approach Nonuniform (GA) Feature selection Incremental or iterative Greedy Greedy Multiview early and mid-fusion Uniform (Random subspace) Nonuniform (GA) Feature selection Incremental or iterative Greedy Greedy Multiview early and mid-fusion This is what I think: 1. 2. Deciding which approach to take is rather art than science This choice is, crucially, CONTEX-SPECIFIC. Where does diversity come to this? Hmm... Nowhere...