slides - +* Tudalenau Bangor Pages

advertisement
Are we still talking about diversity
in classifier ensembles?
Ludmila I Kuncheva
School of Computer Science
Bangor University, UK
Are weCompletely
still talkingirrelevant
about diversity
your Workshop...
intoclassifier
ensembles?
Ludmila I Kuncheva
School of Computer Science
Bangor University, UK
Let’s talk instead of:
Multi-view and classifier ensembles
A classifier ensemble
class label
“combiner”
classifier
classifier
feature values
(object description)
classifier
class label
ensemble?
classifier
combiner
classifier
a neural network
feature values
(object description)
class label
a fancy
combiner
ensemble?
classifier
classifier
classifier
classifier
classifier
classifier
classifier
feature values
(object description)
a fancy
feature
extractor
class label
classifier?
classifier
“combiner”
classifier
feature values
(object description)
classifier
Why classifier ensembles then?
a. because we like to complicate entities beyond
necessity (anti-Occam’s razor)
b. because we are lazy and stupid and can’t be bothered
to design and train one single sophisticated classifier
c. because democracy is so important to our society, it
must be important to classification
combination of multiple classifiers [Lam95,Woods97,Xu92,Kittler98]
classifier fusion [Cho95,Gader96,Grabisch92,Keller94,Bloch96]
mixture of experts [Jacobs91,Jacobs95,Jordan95,Nowlan91]
committees of neural networks [Bishop95,Drucker94]
consensus aggregation [Benediktsson92,Ng92,Benediktsson97]
voting pool of classifiers [Battiti94]
dynamic classifier selection [Woods97]
oldest
composite classifier systems [Dasarathy78]
classifier ensembles [Drucker94,Filippi94,Sharkey99]
bagging, boosting, arcing, wagging [Sharkey99]
oldest
modular systems [Sharkey99]
collective recognition [Rastrigin81,Barabash83]
stacked generalization [Wolpert92]
divide-and-conquer classifiers [Chiang94]
pandemonium system of reflective agents [Smieja96]
change-glasses approach to classifier selection [KunchevaPRL93]
etc.
combination of multiple classifiers [Lam95,Woods97,Xu92,Kittler98]
classifier fusion [Cho95,Gader96,Grabisch92,Keller94,Bloch96]
mixture of experts [Jacobs91,Jacobs95,Jordan95,Nowlan91]
committees of neural networks [Bishop95,Drucker94]
consensus aggregation [Benediktsson92,Ng92,Benediktsson97]
voting pool of classifiers [Battiti94]
dynamic classifier selection [Woods97]
composite classifier systems [Dasarathy78]
classifier ensembles [Drucker94,Filippi94,Sharkey99]
Out of fashion
bagging, boosting, arcing, wagging [Sharkey99]
modular systems [Sharkey99]
collective recognition [Rastrigin81,Barabash83]
stacked generalization [Wolpert92]
divide-and-conquer classifiers [Chiang94]
Subsumed
pandemonium system of reflective agents [Smieja96]
change-glasses approach to classifier selection [KunchevaPRL93]
etc.
class label
classifier ensemble
combiner
classifier
classifier
feature values
(object description)
classifier
Congratulations!
The Netflix Prize sought to
substantially improve the
accuracy of predictions
about how much someone
is going to enjoy a movie
based on their movie
preferences.
On September 21, 2009 we
awarded the $1M Grand
Prize to team “BellKor’s
Pragmatic Chaos”. Read
about their algorithm,
checkout team scores on
the Leaderboard, and join
the discussions on the
Forum.
We applaud all the
contributors to this quest,
which improves our ability
to connect people to the
movies they love.
class label
classifier ensemble
combiner
classifier
classifier
feature values
(object description)
classifier
Classifier combination? Hmmmm…..
David J. Hand (2006) Classifier technology and the illusion of progress, Statist.
Sci. 21 (1), 1-14.
We are kidding ourselves; there is no real progress in spite of ensemble methods.
David Hand
S. Dzeroski, and B. Zenko. (2004) Is combining classifiers better than selecting the
best one? Machine Learning, 54, 255-273.
Saso Dzeroski
Chances are that the single best classifier will be better than the ensemble.
Quo Vadis?
"combining classifiers" OR
"classifier combination" OR
"classifier ensembles" OR
"ensemble of classifiers" OR
"combining multiple classifiers" OR
"committee of classifiers" OR
"classifier committee" OR
"committees of neural networks" OR
"consensus aggregation" OR
"mixture of experts" OR
"bagging predictors" OR
adaboost OR
(( "random subspace" OR "random forest"
OR "rotation forest" OR boosting)
AND "machine learning")
Gartner’s Hype Cycle: a typical evolution pattern of a new technology
peak of inflated expectations
Where are we?...
phoria
na ive e
u
visibility
asymptote of reality
slope of enlightenment
trough of disillusionment
time
top cited paper is from…
PR
IEEE TPAMI
IEEE TPAMI
JAE
PPL
PPL
JTB
CC
application paper
0.3
IJCV
0.25
0.15
0.1
0
1990
IEEE TPAMI
NN
ML
IEEE TPAMI
IEEE TPAMI
ML
IEEE TPAMI
ML
JASA
ML
0.2
IEEE TSMC
per mil of published papers on classifier ensembles
0.35
0.05
(6) IEEE TPAMI = IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE TSMC = IEEE Transactions on Systems, Man and Cybernetics
JASA = Journal of the American Statistical Association
IJCV = International Journal of Computer Vision
JTB = Journal of Theoretical Biology
(2) PPL = Protein and Peptide Letters
JAE = Journal of Animal Ecology
PR = Pattern Recognition
(4) ML = Machine Learning
NN = Neural Networks
CC = Cerebral Cortex
1995
2000
time
2005
2010
4500
[ML] Bagging predictors
4000
[ML] Random forests
number of citations
3500
3000
2500
[IEEE TPAMI] On combining classifiers
[IJCV] Robust real-time face detection
2000
1500
1000
500
0
1990
1992
1994
1996
1998
2000 2002
time
2004
2006 2008
2010
2012
International Workshop on Multiple Classifier Systems
2000 – 2013 - continuing
Levels of questions
A
Combination level
• selection or fusion?
• voting or another combination method?
• trainable or non-trainable combiner?
Combiner
Classifier 1
Classifier 2
Classifier level
• same or different classifiers?
• decision trees, neural networks or other?
• how many?
B
…
C
D
Features
Data level
• independent/dependent
bootstrap samples?
• selected data sets?
Data set
Classifier L
Feature level
• all features or subsets of features?
• random or selected subsets?
50 diverse linear classifiers
50 non-diverse linear classifiers
Strength of classifiers
The perfect classifier
 Large ensemble of
• 3-8 classifiers
• heterogeneous
• trained combiner
(stacked generalisation)
?
nearly identical classifiers
- REDUNDANCY
•
•
•
•
•
1
30-50 classifiers
How about here?
same or different models?
trained or non-trained combiner?
selection or fusion?
IS IT WORTH IT?
Number of classifiers L
Must engineer diversity…
 Small ensembles of
?
weak classifiers
- INSUFFICIENCY
• 100+ classifiers
• same model
• non-trained combiner
(bagging, boosting, etc.)
Strength of classifiers
The perfect classifier
 Large ensemble of
• 3-8 classifiers
• heterogeneous
• trained combiner
(stacked generalisation)
nearly identical classifiers
- REDUNDANCY
•
•
•
•
•
1
30-50 classifiers
same or different models?
trained or non-trained combiner?
selection or fusion?
IS IT WORTH IT?
Number of classifiers L
Must engineer diversity…
 Small ensembles of
weak classifiers
- INSUFFICIENCY
• 100+ classifiers
• same model
• non-trained combiner
(bagging, boosting, etc.)
A classifier ensemble
class label
“combiner”
classifier
classifier
feature values
(object description)
classifier
one view
A classifier ensemble
class label
“combiner”
classifier
feature values
(object description)
classifier
feature values
(object description)
classifier
feature values
(object description)
multiple
views
1998
“distinct” is what you call
“late fusion”
“shared” is what you call
“early fusion”
EXPRESSION OF EMOTION - MODALITIES
facial expression
behavioural
eye tracking
physiological
interaction
with the
computer
gesture
speech
posture
pressure on mouse
drag-click speed
dialogue with tutor
central
nervous
system
EEG
peripheral
nervous
system
fMRI
fNIRS
pulse rate
EMG
pulse variation
respiration
skin to
Galvanic skin response
blood pressure
Data
Classification Strategies
modality 1
(1) Concatenate the features from all modalities
“early fusion”
(2) Feature extraction and concatenation
“mid-fusion”
modality 2
(3) Straight ensemble classification
“late fusion”
modality 3
ensemble
And many combinations thereof...
Data
modality 1
Classification Strategies
We capture all dependencies but can’t handle the complexity
(1) Concatenate the features from all modalities
“early fusion”
(2) Feature extraction and concatenation
“mid-fusion”
modality 2
(3) Straight ensemble classification
“late fusion”
modality 3
ensemble
We lose the dependencies but can handle the complexity
Ensemble Feature Selection
By the ensemble
(RANKERS)
Decision
tree
ensembles
Ensembles
of different
rankers
For the ensemble
Bootstrap
ensembles
of rankers
Uniform
(Random
subspace)
Multiview
late fusion
Random
approach
Systematic
approach
Nonuniform
(GA)
Feature
selection
Incremental
or iterative
Greedy
Greedy
Multiview early and mid-fusion
Uniform
(Random
subspace)
Nonuniform
(GA)
Feature
selection
Incremental
or iterative
Greedy
Greedy
Multiview early and mid-fusion
This is what I think:
1.
2.
Deciding which approach to take is rather art
than science
This choice is, crucially, CONTEX-SPECIFIC.
Where does diversity come to this?
Hmm... Nowhere...
Download