Ensemble Feature Selection

advertisement
IJCAI’2005, Edinburgh, Scotland August 1-5, 2005
Sequential Genetic Search for
Ensemble Feature Selection
Alexey Tsymbal, Padraig Cunningham
Department of Computer Science
Trinity College Dublin
Ireland
Mykola Pechenizkiy
Department of Computer Science
University of Jyväskylä
Finland
Contents

Introduction
– Classification and Ensemble Classification

Ensemble Feature Selection
– strategies
– sequential genetic search

Our GAS-SEFS strategy
– Genetic Algorithm-based Sequential Search
for Ensemble Feature Selection
Experiment design
 Experimental results
 Conclusions and future work

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
2
The Task of Classification
J classes, n training observations, p features
Given n training instances
Training
New instance
(xi, yi) where xi are values of
Set
to be classified
attributes and y is class
CLASSIFICATION
Goal: given new x0,
predict class y0
Examples:
Class Membership of
the new instance
- prognostics of recurrence of breast cancer;
- diagnosis of thyroid diseases;
- antibiotic resistance prediction
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
3
Ensemble classification
T
Learning
T1
T2
…
TS
h1
h2
…
hS
How to prepare inputs for generation of the base classifiers?
(x, ?)
Application
h* = F(h1, h2, …, hS)
(x, y*)
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
4
Ensemble classification
T
Learning
T1
T2
…
TS
h1
h2
…
hS
How to combine the predictions of the base classifiers?
(x, ?)
Application
h* = F(h1, h2, …, hS)
(x, y*)
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
5
Ensemble feature selection
 How to prepare inputs for generation of the base classifiers ?
– Sampling the training set
– Manipulation of input features
– Manipulation of output targets (class values)
 Goal of traditional feature selection
– find and remove features that are unhelpful or misleading
to learning (making one feature subset for single
classifier)
 Goal of ensemble feature selection
– find and remove features that are unhelpful or destructive
to learning making different feature subsets for a number
of classifiers
– find feature subsets that will promote diversity
(disagreement) between classifiers
EEA
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
6
Search in EFS
Search space:
2#Features * #Classifiers
Search strategies include:
 Ensemble Forward Sequential Selection (EFSS)

Ensemble Backward Sequential Selection (EBSS)

Hill-Climbing (HC)

Random Subspacing Method (RSM)

Genetic Ensemble Feature Selection (GEFS)
Fitness function:
Fitnessi  acci    divi
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
7
Measuring Diversity
The fail/non-fail disagreement measure : the percentage of test
instances for which the classifiers make different predictions but
for which one of them is correct:
N 01  N 10
div _ disi,j  11
N  N 10  N 01  N 00
The kappa statistic:
1  2
div_ kappai,j 
1  2
l
 Nii
1 
i 1
N
l
N
N
2    Ni*  N*i 


i 1
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
8
Random Subspace Method
 RSM itself is simple but effective technique for
EFS
– the lack of accuracy in the ensemble members is
compensated for by their diversity
– does not suffer from the curse of dimensionality
– RS is used as a base in other EFS strategies, including
Genetic Ensemble Feature Selection.
 Generation of initial feature subsets using (RSM)
 A number of refining passes on each feature set
while there is improvement in fitness
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
9
Genetic Ensemble Feature Selection
 Genetic search – important direction in FS research
– GA as effective global optimization technique
 GA for EFS:
– Kuncheva, 1993: Ensemble accuracy instead of accuracies of
base classifiers
• Fitness function is biased towards particular integration method
• Preventive measures to avoid overfitting
– Alternative: use of individual accuracy and diversity
• Overfitting of individual is more desirable than overfitting of
ensemble
– Opitz, 1999: Explicitly used diversity in fitness function
• RSM for initial population
• New candidates by crossover and mutation
• Roulette-wheel selection (p proportional to fitness)
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
10
Genetic Ensemble Feature Selection
mutation
population of genotypes (base classifiers)
10111
10011
01001
01001
10001
00111
11001
01011
recombination
f
coding scheme
selection
10011
10
10001
011
001
01001
01
01011
001
011
phenotype space
10001
10001
01011
11001
x
fitness
Current ensemble
of base classifiers
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
11
Basic Idea behind GA for EFS
Ensemble
(generation)
BC1
init
RSM
Current Population
(diversity)
GA
BCi
New Population
(fitness)
BCEns.
Size
Fitnessi  acci    divi
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
12
Basic Idea behind GAS-SEFS
Generation
Ensemble
BC1
New Population
(fitness)
init
RSM
Current Population
(accuracies)
diversity
new BC (fitness)
GAi+1
BCi
BCi+1
Fitnessi  acci    divi
BCi+1
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
13
GAS-SEFS 1 of 2
 GAS-SEFS (Genetic Algorithm-based Sequential
Search for Ensemble Feature Selection)
– instead of maintaining a set of feature subsets in each
generation like in GA, consists in applying a series of
genetic processes, one for each base classifier,
sequentially.
– After each genetic process one base classifier is selected
into the ensemble.
– GAS-SEFS uses the same fitness function, but
• diversity is calculated with the base classifiers already
formed by previous genetic processes
• In the first GA process – accuracy only.
– GAS-SEFS uses the same genetic operators as GA.
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
14
GAS-SEFS 2 of 2
 GA and GAS-SEFS peculiarities:
– Full feature sets are not allowed in RS
– The crossover operator may not produce a full feature
subset.
– Individuals for crossover are selected randomly
proportional to log(1+fitness) instead of just fitness
– The generation of children identical to their parents is
prohibited.
– To provide a better diversity in the length of feature
subsets, two different mutation operators are used
• Mutate1_0 deletes features randomly with a given
probability;
• Mutate0_1 adds features randomly with a given
probability.
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
15
Computational complexity
Complexity of GA-based search does not depend on the #features
GAS-SEFS:
GA:
where S is the number of base classifiers, S’ is the number of
individuals (feature subsets) in one generation, and Ngen is the
number of generations.
EFSS and EBSS:
where S is the number of base classifiers, N is the total number
of features, and N’ is the number of features included or deleted
on average in an FSS or BSS search.
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
16
Integration of classifiers
Selection/Combination
Dynamic
Static
Static Selection
(CVM)
Weighted Voting
(WV)
Dynamic Selection
(DS)
Dynamic Voting
with Selection (DVS)
Motivation for the Dynamic Integration:
Each classifier is best in some sub-areas of the whole data
set, where its local error is comparatively less than the
corresponding errors of the other classifiers.
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
17
Experimental Design
 Parameter settings for GA and GAS-SEFS:
– a mutation rate - 50%;
– a population size – 10;
– a search length of 40 feature subsets/individuals:
• 20 are offsprings of the current population of 10 classifiers
generated by crossover,
• 20 are mutated offsprings (10 with each mutation operator).
– 10 generations of individuals were produced;
– 400 (GA) and 4000 (GAS-SEFS) feature subsets.
 To
–
–
–
evaluate GA and GAS-SEFS:
5 integration methods
Simple Bayes as Base Classifier
stratified random-sampling with 60%/20%/20% of
instances in the training/validation/test set;
– 70 test runs on each of 21 UCI data set for each strategy
and diversity.
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
18
GA vs GAS-SEFS on two groups of datasets
DVS
0.840
F/N-F
disagreement
0.835
Ensemble
Size
0.830
3
5
7
10
0.825
0.820
0.815
0.810
GA_gr1
GAS-SEFS_gr1
GA_gr2
GAS-SEFS_gr2
Ensemble accuracies for GA and GAS-SEFS on two groups of data
sets (1): < 9 and (2) >= 9 features with four ensemble sizes
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
19
GA vs GAS-SEFS for Five Integration Methods
0.95
Ensemble
Size = 10
0.90
0.85
GA
GAS-SEFS
0.80
0.75
0.70
0.65
SS
WV
DS
DV
DVS
Ensemble accuracies for five integration methods on Tic-Tac-Toe
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
20
Conclusions and Future Work



Diversity in ensemble of classifiers is very important
We have considered two genetic search strategies for EFS.
The new strategy, GAS-SEFS, consists in employing a series
of genetic search processes
– one for each base classifier.

GAS-SEFS results in better ensembles having greater
accuracy
– especially for data sets with relatively larger numbers of
features.
– one reason – each of the core GA processes leads to significant
overfitting of a corresponding ensemble member

GAS-SEFS is significantly more time-consuming than GA.
– GAS-SEFS = ensemble_size * GA

[Oliveira et al., 2003] better results for single FSS based on
Pareto-front dominating solutions.
– Adaptation of this technique to EFS is an interesting topic for
further research.
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
21
Thank you!
Alexey Tsymbal, Padraig Cunningham
Dept of Computer Science
Trinity College Dublin
Ireland
Alexey.Tsymbal@cs.tcd.ie,
Padraig.Cunningham@cs.tcd.ie
Mykola Pechenizkiy
Department of Computer Science
and Information Systems University of Jyväskylä
Finland
mpechen@cs.jyu.fi
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
22
Additional Slides
References
•
•
•
•
[Kuncheva, 1993] Ludmila I. Kuncheva. Genetic algorithm for
feature selection for parallel classifiers, Information Processing
Letters 46: 163-168, 1993.
[Kuncheva and Jain, 2000] Ludmila I. Kuncheva and Lakhmi C.
Jain. Designing classifier fusion systems by genetic algorithms,
IEEE Transactions on Evolutionary Computation 4(4): 327-336,
2000.
[Oliveira et al., 2003] Luiz S. Oliveira, Robert Sabourin, Flavio
Bortolozzi, and Ching Y. Suen. A methodology for feature
selection using multi-objective genetic algorithms for handwritten
digit string recognition, Pattern Recognition and Artificial
Intelligence 17(6): 903-930, 2003.
[Opitz, 1999] David Opitz. Feature selection for ensembles. In
Proceedings of the 16th National Conference on Artificial
Intelligence, pages 379-384, 1999, AAAI Press.
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
24
GAS-SEFS Algorithm
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
25
Other interesting findings
•
alpha
– were different for different data sets,
– for both GA and GAS-SEFS, alpha for the dynamic integration
methods is bigger than for the static ones (2.2 vs 0.8 on
average).
– GAS-SEFS needs slightly higher values of alpha than GA (1.8 vs
1.5 on average).
• GAS-SEFS always starts with a classifier, which is based on
accuracy only, and the subsequent classifiers need more diversity
than accuracy.
•
# of selected features falls as the ensemble size grows,
– this is especially clear for GAS-SEFS, as the base classifiers need more
diversity.
•
integration methods (for both GA and GAS-SEFS):
– the static, SS and WV, and the dynamic DS start to overfit the validation
set already after 5 generations and show lower accuracies,
– accuracies of DV and DVS continue to grow up to 10 generations.
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
26
Paper Summary
• New strategy for
genetic ensemble
feature selection, GASSEFS, is introduced
• In contrast with
previously considered
algorithm (GA),
it is sequential; a
serious of genetic
processes for each
base classifier
0.840
0.835
0.830
3
5
7
10
0.825
0.820
0.815
0.810
GA_gr1
GAS-SEFS_gr1
GA_gr2
GAS-SEFS_gr2
• More time-consuming, but with better accuracy
• Each base classifier has a considerable level of overfitting with
GAS-SEFS, but the ensemble accuracy grows
• Experimental comparisons demonstrate clear superiority on 21 UCI
datasets, especially for datasets with many features (gr1 vs gr2)
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
27
Simple Bayes as Base Classifier

Bayes theorem:
P(C|X) = P(X|C)·P(C) / P(X)

Naïve assumption: attribute independence
P(x1,…,xk|C) = P(x1|C)·…·P(xk|C)
If i-th attribute is categorical:
P(xi|C) is estimated as the relative freq of samples
having value xi as i-th attribute in class C

If i-th attribute is continuous:
P(xi|C) is estimated thru a Gaussian density function


Computationally easy in both cases
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
28
Dataset’s characteristics
Data set
Instances
Classes
Balance
Breast Cancer
625
286
Car
Features
3
2
Categ.
0
9
Num.
4
0
1728
4
6
0
Diabetes
768
2
0
8
Glass Recognition
214
6
0
9
Heart Disease
Ionosphere
Iris Plants
LED
LED17
270
351
150
300
300
2
2
3
10
10
0
0
0
7
24
13
34
4
0
0
Liver Disorders
345
2
0
6
Lymphography
MONK-1
MONK-2
MONK-3
Soybean
Thyroid
Tic-Tac-Toe
Vehicle
Voting
Zoo
148
432
432
432
47
215
958
846
435
101
4
2
2
2
4
3
2
4
2
7
15
6
6
6
0
0
9
0
16
16
3
0
0
0
35
5
0
18
0
0
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
29
GA vs GAS-SEFS for Five Integration Methods
0.95
0.90
Ensemble
Size
0.85
3
5
7
10
0.80
0.75
0.70
0.65
SS
WV
DS
DV
DVS
SS
WV
DS
DV
DVS
Ensemble accuracies for GA (left) and GAS-SEFS (right) for five
integration methods and four ensemble sizes on Tic-Tac-Toe
IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005
Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham P.
30
Download