Feature Selection via Global Redundancy Minimization Abstract

advertisement
Feature Selection via Global Redundancy Minimization
Abstract:
Feature selection has been an important research topic in data mining,
because the real data sets often have high-dimensional features, such as the
bioinformatics and text mining applications. Many existing filter feature
selection methods rank features by optimizing certain feature ranking
criterions, such that correlated features often have similar rankings. These
correlated features are redundant and don't provide large mutual
information to help data mining. Thus, when we select a limited number of
features, we hope to select the top non-redundant features such that the
useful mutual information can be maximized. In previous research, Ding et
al. recognized this important issue and proposed the minimum
Redundancy Maximum Relevance Feature Selection (mRMR) model to
minimize the redundancy between sequentially selected features.
However, this method used the greedy search, thus the global feature
redundancy wasn't considered and the results are not optimal. In this
paper, we propose a new feature selection framework to globally minimize
the feature redundancy with maximizing the given feature ranking scores,
which can come from any supervised or unsupervised methods. Our new
model has no parameter so that it is especially suitable for
practical data mining application. Experimental results on
benchmark data sets show that the proposed method consistently improves
the feature selection results compared to the original methods. Meanwhile,
we introduce a new unsupervised global and local discriminative feature
selection method which can be unified with the global feature redundancy
minimization framework and shows superior performance.
Download